# THE PSYCHOLOGY OF HUMAN THOUGHT AN INTRODUCTION

EDITED BY ROBERT J. STERNBERG & JOACHIM FUNKE

The Psychology of Human Thought: An Introduction

# The Psychology of Human Thought: An Introduction

Robert J. Sternberg & Joachim Funke (Eds.)

#### Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

This book is published under the Creative Commons License 4.0 (CC BY-SA 4.0). The cover is subject to the Creative Commons License CC-BY ND 4.0.

The electronic, open access version of this work is permanently available on Heidelberg University Publishing's website: HEIDELBERG UNIVERSITY PUBLISHING (https://heiup.uni-heidelberg.de). URN: urn:nbn:de:bsz:16-heiup-book-470-3 DOI: https://doi.org/10.17885/heiUP.470

Text © 2019 by the authors.

Cover image: Matthias Ripp, Gentrification (detail), https://www.flickr.com/photos/56218409@N03/16288222673

ISBN: 978-3-947732-34-0 (Softcover) ISBN: 978-3-947732-35-7 (Hardcover) ISBN: 978-3-947732-33-3 (PDF)

# List of Contributors

Rakefet Ackerman Faculty of Industrial Engineering and Management Israel Institute of Technology Technion City, Haifa 3200003, Israel

Andrea Bender Department of Psychosocial Science University of Bergen 5020 Bergen, Norway

Arndt Bröder School of Social Sciences Department of Psychology University of Mannheim 68131 Mannheim, Germany

Janet Davidson Department of Psychology Lewis & Clark University Portland, OR 97219, USA

Jonathan St. B. Evans School of Psychology University of Plymouth Devon PL4 8AA, United Kingdom

Klaus Fiedler Department of Psychology Heidelberg University 69117 Heidelberg, Germany

Joseph P. Forgas School of Psychology University of New South Wales Sydney, NSW 2052, Australia

Joachim Funke Department of Psychology Heidelberg University 69117 Heidelberg, Germany

Kathleen M. Galotti Cognitive Science Program Carleton College Northfield, MN 55057, USA

David M. Garavito Department of Human Development Cornell University Ithaca, NY 14850, USA

Mary Gauvain Department of Psychology University of California, Riverside Riverside, CA 92521, USA

Judith Glück Institute of Psychology University of Klagenfurt Klagenfurt am Wörthersee, Austria

Arthur C. Graesser Department of Psychology and Institute for Intelligent Systems The University of Memphis Memphis, TN 38152, USA

Zach Hambrick Department of Psychology Michigan State University East Lansing, MI 48824, USA Kenneth J. Kurtz Cognitive and Brain Sciences Psychology Department Binghamton University Binghamton, NY 13902, USA

Kimery R. Levering Department of Psychology Marist College Poughkeepsie, NY 12601, USA

Anne M. Lippert Department of Psychology and Institute for Intelligent Systems The University of Memphis Memphis, TN 38152, USA

Todd Lubart Institute of Psychology University of Paris Descartes (Université Sorbonne Paris Cité) Boulogne Billancourt, France

Julia Nolte Department of Human Development Cornell University Ithaca, NY 14850, USA

Valerie Reyna Department of Human Development Cornell University Ithaca, NY 14850, USA

Chiara Scarampi Institute of Cognitive Neuroscience University College London London, United Kingdom

Ulrich Schroeders Department of Psychological Assessment University of Kassel 34127 Kassel, Germany

Keith T. Shubeck Department of Psychology and Institute for Intelligent Systems The University of Memphis Memphis, TN 38152, USA

Robert J. Sternberg Department of Human Development Cornell University Ithaca, NY 14850, USA

# Branden Thornhill-Miller

Institute of Psychology University of Paris Descartes (Université Sorbonne Paris Cité) Boulogne Billancourt, France & Department of Philosophy University of Oxford Oxford, United Kingdom

#### Lisa von Stockhausen

Department of Psychology University of Duisburg-Essen 45141 Essen, Germany

#### Oliver Wilhelm

Department of Individual Differences and Psychological Assessment Ulm University 89069 Ulm, Germany

# Contents




#### 11 Nature of Language



#### 16 Wisdom


### 17 Development of Human Thought


# 18 Affect and Thought: The Relationship Between Feeling and Thinking


### 19 Culture and Thought


This book is dedicated to Dietrich Dörner (Bamberg, Germany) and the late Alexander J. Wearing (Melbourne, Australia), two research pioneers of human thought in complex and dynamic situations.

# Preface

On a sunny day in summer 2016, the two editors (RJS and JF) were sitting in a café on the Hauptstrasse near the Psychology Department of Heidelberg University. When the discussion moved to the topic of textbooks, RJS asked JF if he would be interested in coediting a textbook on the psychology of human thought. There are not many recent competitors, RJS noted. JF agreed that contemporary textbooks in the field of human thought are truly hard to find.

Soon the idea emerged to produce an "openaccess" textbook that could be used, free of charge, by students all over the world. The newly founded publishing house, "Heidelberg University Publishing" (HeiUP), seemed to be a perfect platform for this idea. We wrote a proposal for the Editorial Board of HeiUP, which accepted our idea and soon gave us the go-ahead. We then looked for potential contributors for our chapters and obtained commitments from some of the world's leading experts in the field.

Although not every college or university teaches such a course, we believe that it is an extremely important course for any psychology major—or, arguably, anyone at all—to take. First, we know that even a high IQ does not guarantee that a person will think well in his or her everyday life. People commit cognitive fallacies, such as the sunk-cost fallacy (otherwise known as "throwing good money after bad"), every day. It is important for students to understand their lapses in thinking and to have ways of correcting them. Second, standard cognitive-psychology or cognitive-science courses only scratch the surface of the field of human thought. Such courses need to include a wide variety of other topics, such as perception, learning, and memory, so that they cannot possibly go into any true depth on complex thought processes. Our textbook fills this gap. Third, we are

seeing today how even leaders all over the world individuals chosen to help guide whole countries into the future—often show astonishing and sometimes seemingly inexplicable lapses in their critical thinking. We all need to understand how such lapses can occur, especially when people are under stress, and how they can be corrected. We hope, therefore, that you profit as much from this course as we both did, taking similar courses, when we were younger.

## The Content

This idea for an edited textbook, *The Psychology of Human Thought: An Introduction*, is motivated by our view that much of the "action" in psychological science today involves the study of human thought (as witnessed by the success of books such as Daniel Kahneman's *Thinking, Fast and Slow*, 2011, and of Steven Pinker's *The Stuff of Thought*, 2007, both of which became best sellers). The excitement of the field notwithstanding, we were able to find only two textbooks on the topic of human thought (Manktelow, 2012; Minda, 2015). Yet, a course on "Thinking" (or any of its related course names) is one of the most exciting in psychology. Such a course, taught at the undergraduate level by the late Professor Alexander Wearing, was part of what motivated RJS to enter the field of complex cognition. Because of the scarcity of recent textbooks covering the broad range of this field, it seemed timely to present a new one edited and authored by experts in the field of human thought.

#### For Whom This Book is Written

This volume is intended as a primary or secondary textbook for courses on what we call "The Psychology of Human Thought", which can take a number

of different names, such as The Psychology of Human Thought, Thinking, Reasoning, Problem Solving, Decision Making, Complex Processes, Higher Processes, Complex Cognition, Higher Cognition, or similar titles.

The course is usually taught at the third (college junior) undergraduate level, or one level higher than courses on Cognitive Psychology. Many students with an interest in cognition take the cognitivepsychology or cognitive-science course first, followed by the more advanced course on human thought.

## How to Use This Book

The chapters describe the specific topics of the field in terms of theories, research, and applications. The pedagogical elements in the book include:


representations of information help students understand the material better.


# Conclusion

We hope that you enjoy this overview of the psychology of human thought. If you have any comments or suggestions, please send them to the editors at robert.sternberg@cornell.edu or joachim.funke@psychologie.uni-heidelberg.de

The editors thank the very supportive team from Heidelberg University Publishing, especially Maria Effinger, Daniela Jakob, Anja Konopka and Frank Krabbes. Claire Holfelder and David Westley did a wonderful job in checking language from non-native authors. Also, we had the luck to have one of the best (and fastest!) copyeditors we could think of: Julia Karl. Thanks a lot for your invaluable help, Julia! It was fun to work with you!

> R.J.S. & J.F. Ithaca, NY, USA & Heidelberg, Germany Summer 2019

# References


# Chapter 1

# The Psychology of Human Thought: Introduction

ROBERT J. STERNBERG & JOACHIM FUNKE

Cornell University & Heidelberg University

The psychology of human thought deals with how people mentally represent and process complex information. For example, if you imagine an object rotating in space, you might represent the rotating object as an image of the object, or as a series of propositions that specify the characteristics of the object and its successive positions in space. A psychological scientist who studies human thought might investigate how people solve complex problems, or make decisions, or learn language, or use reasoning to decide whether the claims of a politician are true. Why do people find it easier to reason when the content of what they are reasoning about is familiar than unfamiliar, but why, at the same time, are they more likely to make an error in reasoning when the content is familiar? Why are people more afraid to travel in airplanes than in cars, even though, statistically, riding in a car is far more dangerous than riding in an airplane? Why do people view a robin or a bluebird as more "like a bird" than an ostrich or a penguin, even though all are birds? These are the kinds of questions that psychological scientists address when they study the psychology of human thought.

#### 1.1 Goals of Research

Research in the psychology of human thought takes many forms, but it generally follows a particular form. We will illustrate this form with regard to the purchase of a new bicycle.

Suppose you are trying to figure out how people decide on a brand of bicycle (or anything else!) they would like to buy. How do they think about this problem? As a psychological scientist, you might start thinking about the issue by informally considering some of the ways in which people might make such a decision (see, e.g., Gigerenzer, 2015; Kahneman, 2013; Reyna, Chapman, Dougherty, & Confrey, 2011). Here are some strategies that a potential bicycle-buyer might use:


Of course, there are other possibilities, but suppose, for the purposes of this chapter, you consider just these three possibilities. You might then create a theory—an organized body of general explanatory principles regarding a phenomenon. For example, your theory might be that, in the end, people

avoid complication and make their decisions only on the basis of the most important factor in a decision (see Gigerenzer, 2015). Then you might propose an hypothesis—a tentative proposal of expected empirical consequences of the theory, such as of the outcome of research. So here, your hypothesis is that if you offer people a series of bicycles, and know their preferences regarding aspects of a bicycle, their decision as to which one to buy will depend only on the single feature that is most important to them. Now you might design an experiment—a set of procedures to test your hypothesis (or hypotheses). In the experiment, you might ask people about the features that matter to them, how important each feature is, and then, which of several bicycles they would choose, assuming they had a choice. You then would do data analysis—statistically investigating your data to determine whether they support your hypothesis. You then could draw at least tentative conclusions as to whether your theory was correct.

One thing to remember is that many scientists believe, following Karl Popper (2002), that you only can falsify ideas through experiments, not conclusively prove them. That is, even if the results of an experiment are consistent with your theory, it does not mean that all possible experiments testing the theory would be consistent with the theory. More likely, some would be consistent but others would not be. However, if the results are not consistent with the theory, then perhaps you would want to move on to a new theory; or alternatively, you would want to see whether the theory is true only under limited sets of circumstances.

# 1.2 Underlying Themes in the Study of Human Thought

Theories and research in the study of human thought tend to recycle through a set of underlying themes. What are some of the main themes that arise again and again in the study of higher cognition, such as in the exploration of human thought? To understand the psychology of human thought, you need to understand how these themes recur, over and over again (see Table 1.1). In the text and table, we refer to the two aspects of the themes as potentially com-

plementary rather than contradictory. For example, almost all behavior will result from an interaction of genetic and environmental factors, rather than resulting solely from one or the other. For consistency, we will show how seven themes arise in a single area of research, human intelligence.

# 1.2.1 Nature and Nurture

One major issue in the study of human thought is the respective influences on human cognition of nature, on the one hand, and nurture, on the other. Scientists who believe that innate characteristics of human cognition, those due to *nature*, are more important may focus their research on innate characteristics; those who believe in the greater importance of the environment, attributes due to nurture, may choose to focus on acquired characteristics.

Perhaps nowhere has this issue played out more than in the study of human intelligence (see, e.g., Sternberg & Grigorenko, 1997). Intelligence researchers have argued for many years regarding the respective roles of genes and environment in intelligence, and two researchers with opposing points of view even wrote a book about their opposing stances (Eysenck & Kamin, 1981). At the time of their book, hereditarian and environmental viewpoints were viewed as in opposition to each other.

Today, scientists recognize that the picture is more complex than it appeared to be at that time. Most likely, genetic effects are not due to some "intelligence gene", but rather due to many genes, each having very small effects (Tan & Grigorenko, in press). The genes that have been identified so far as possibly contributing to intelligence are of small effect and their effects are sometimes difficult to replicate. It appears that environment plays an important role, often in conjunction with genes (Flynn, 2016). Some effects may *epigenetic*, meaning that aspects of the environment may turn certain genes "on" and "off", either resulting in their commencing or ceasing, respectively, to affect development.

# 1.2.2 Rationalism and Empiricism

*Rationalist* investigators tend to believe that one can learn a lot about human behavior through reflec-


Table 1.1: Major Themes in the Study of Human Thought.

tion and self-introspection. *Empiricist* investigators believe in the necessity of data collection. The rationalist tradition dates back to the Greek philosopher Plato, whose ideas are discussed further in Chapter 2,"History of the Field of the Psychology of Human Thought".

In *The Theaetetus*, one of the Platonic dialogues, *Theaetetus* imagines that there exists in the mind of man a block of wax, which is of different sizes in different men. The blocks of wax can also differ in hardness, moistness, and purity. Socrates, a famous Greek philosopher, suggests that when the wax is pure and clear and sufficiently deep, the mind will easily learn and retain and will not be subject to confusion. It only will think things that are true, and because the impressions in the wax are clear, they will be quickly distributed into their proper places on the block of wax. But when the wax is muddy or impure or very soft or very hard, there will be defects of the intellect (*Great Books of the Western World*, 1987, 7, 540).

Plato's view of intelligence in terms of a metaphorical ball of wax is the product of a rationalist approach: Obviously, he did not do any kind of formal experimentation to derive or test this point of view. Aristotle, another early Greek philosopher, in contrast, took a more empirical approach to understanding intelligence:

In the *Posterior Analytics Book I*, Aristotle conceived of intelligence in terms of "quick wit":

Quick wit is a faculty of hitting upon the middle term instantaneously. It would be exemplified

by a man who saw that the moon has a bright side always turned towards the sun, and quickly grasped the cause of this, namely that she borrows her light from him; or observes somebody in conversation with a man of wealth and defined that he was borrowing money, or that the friendship of these people sprang from a common enmity. In all these instances he has seen the major and minor terms and then grasped the causes, the middle terms. (*Hutchins: Great Books of the Western World*, 1952, Vol. 8, p. 122).

Although in Aristotle's times, no one did formal experiments, notice that Aristotle gives a genuine real-world example, presumably derived from his past experiences, whereas Plato's discussion in *The Theaetetus* was obviously hypothetically derived (or contrived).

Today, psychological scientists studying intelligence use an empirical approach. But rationalism still plays an important part. Many theories, when originally posed, are derived largely from the thinking processes of scientists. After the theories are proposed, they then are tested empirically, usually on human subjects, but sometimes by computer simulations or by other means. In the modern-day study of human thought, both rationalism and empiricism have a place.

#### 1.2.3 Structures and Processes

*Structures* here refer to the contents, attributes, and relations between parts of the human mind. *Pro-* *cesses* refer to the actual operations of the human mind. Much of early research on human intelligence was structural. Theorists of intelligence argued, and to some extent, still argue about structural models of intelligence. For example, Charles Spearman (1927) believed that human intelligence can be characterized structurally by one general factor of the mind permeating our performance on all cognitive tasks, and then specific factors particular to each cognitive task. Louis Thurstone (1938) believed that there are seven primary mental abilities: verbal comprehension, verbal fluency, number, spatial visualization, inductive reasoning, perceptual speed, and memory. Today, theorists of intelligence still disagree, to some extent, about these structures. Two prominent models are the CHC (Cattell-Horn-Carroll) model (McGrew, 2005), which argues that there is a general factor of intelligence at the top of a hierarchy of abilities, and two strata below it, including fluid abilities (ability to deal with novel stimuli) and crystallized ability (world knowledge); and the Johnson-Bouchard (2005) g-VPR model, arguing instead that the three main abilities beneath general intelligence are verbal, perceptual, and image rotation. So even today, there are disagreements today about the structure of intellectual abilities and the resolution of these disagreements is an active area of research.

Many of the issues today, however, revolve around process issues. Are there basic processes of intelligence, and if so, what are they?

In the latter part of the twentieth century, Earl Hunt (e.g., Hunt, 1980) proposed what he called a *cognitive correlates* approach to studying the relationship between intelligence and cognition—one would study typical cognitive tasks, such as the time an individual takes in naming a letter, and then look at the correlation between that time and scores on psychometric tests. In this way, Hunt thought, one could understand the basic cognitive building blocks of intelligence.

Sternberg later proposed an alternative *cognitive components* approach (Sternberg, 1983, 1985), whereby intelligence could be understood in terms of components not of simple tasks, like identifying whether two letters are the same as each other, but rather more complex tasks similar to those that

appear on intelligence tests, such as analogies or syllogistic reasoning.

Today, many of the discussions regarding processes underlying intelligence concern working memory (Conway & Kovacs, 2013; Ellingsen & Engle, in press; Kane et al., 2004). Working memory appears to play an important part in processes of intelligence, and is highly related to fluid intelligence (discussed above). Originally, it appeared that working memory is a, or perhaps the crucial component of fluid intelligence (Kyllonen & Chrystal, 1990). But in their recent work, Engle and his colleagues have argued that working memory and fluid intelligence may in fact work separately but in conjunction—with working memory helping us remember what we need to remember but fluid intelligence helping us forget what we need to forget (Ellingsen & Engle, in press).

By the way, one of the first informationprocessing accounts of intelligence was offered by the same scholar who offered the theory of general intelligence (Spearman, 1923). Charles Spearman certainly was one of the most versatile as well as brilliant psychologists of the early twentieth century!

# 1.2.4 Domain-Generality versus Domain-Specificity

The concept of *domain-generality* refers to the notion that a cognitive skill or set of skills might apply across a wide variety of domains. The concept of *domain-specificity* refers to the notion that a cognitive skills or set of skills might apply only in a specific domain, or at most, a small set of domains. Of course, there is no uniformly agreed upon definition of what constitutes a "domain." Is verbal processing a single domain, or are reading, writing, speaking, and listening separate domains?

Spearman (1927) suggested that the aspect of intelligence that we know best, general intelligence or what he called "*g*", is what matters most to people's ability to adapt to the environment. In extreme contrast, Howard Gardner (2011) has suggested that intelligence is highly domain-specific, indeed, that there are eight distinct and independent "intelligences"—linguistic, logical-mathematical,

spatial, bodily-kinesthetic, musical, naturalist, interpersonal, and intrapersonal. He believes that any general intelligence is merely an artifact of the independent intelligences being used in conjunction in a multitude of tasks.

An intermediate information-processing perspective is taken by Sternberg (2011), who has argued that the basic information-processing components of intelligence are the same in all tasks—for example, recognizing the existence of a problem, defining the problem, mentally representing the problem, formulating a strategy to solve the problem—but that how well these processes are performed depends on the domain. That is, how well one can execute a given process depends on the domain in which the process is exercised.

# 1.2.5 Validity of Causal Inferences and Ecological Validity

The advantage of laboratory-based research with carefully controlled experimental conditions is that they promote *validity of causal inferences*, that is, the extent to which scientists can establish causal bases for scientifically observed phenomena. Because scientists in the laboratory often can carefully control independent as well as confounding variables (i.e., variables that are not relevant to an experiment but that might affect the results, clouding conclusions to be drawn), the scientists can ensure, to the extent possible, that experimental effects are due to the variables they are supposed to be due to. But the potential disadvantage of laboratory experiments is that the conditions of testing may be rather remote from the conditions observed in everyday life. One of the most famous scientists to point this out was Ulric Neisser (1976), who argued that many of the results obtained in the laboratory do not apply well to real-world phenomena. *Ecological validity* refers to the generalizability of conclusions to the everyday contexts in which behavior of interest occurs.

Most formal research on intelligence is done in laboratories. The results tell us, for example, that most cognitive tasks tend to correlate positively with each other, meaning that if a person does well on one of them, he or she also will tend to do well on

others. But Sternberg et al. (2001) found that, under circumstances, an important adaptive cognitive task (procedural knowledge among rural Kenyan children of natural herbal medicines used to combat parasitic illnesses) correlated negatively with some of the cognitive tasks used in laboratories and classrooms to measure general intelligence. The point of the research was not that, in general, general intelligence correlates negatively with adaptive procedural knowledge (i.e., knowledge of how accomplish tasks in real-world environments). Rather, the point was that the correlation depends on the circumstances—that we may be too quick to draw general conclusions from experimental contexts that are somewhat limited. Because the Sternberg et al. (2001) study was a field experiment conducted under challenging circumstances in rural Kenya, it would be difficult if not impossible to draw causal conclusions from the research. But the research might have a certain kind of ecological validity lacking in the more "sterile" environment of the psychologist's laboratory or even a carefully controlled classroom administration of a standardized test.

# 1.2.6 Basic Research and Applied Research

*Basic research* attempts to understand fundamental scientific questions, often by testing hypotheses derived from theories. It does not concern itself with how the research is used. *Applied research,* in contrast, seeks to apply scientific knowledge to problems in the world, often with the goal of solving those problems to make the world a better or at least a different place.

Human intelligence is an area that historically has had a lively mix of basic and applied research, not always with the most admirable of outcomes. The research that has yielded some of the theories of intelligence described above, such as *g* theory or the CHC theory, is basic. Applied research has often been in the form of research on intelligence testing, research following in the tradition of Alfred Binet and Theodore Simon (Binet & Simon, 1916), researchers who invented the first "modern" intelligence test. The legacy of this research is mixed. On the one hand, Binet was hopeful that his work

on intelligence could be used to create a kind of "mental orthopedics" that would help those who performed at lower intellectual levels to improve their performance. On the other hand, much of the applied research in the early years of the twentieth century was at least in part pejorative, seeking to demonstrate that people of some socially defined races or ethnicities were inherently more intelligent than others (see Fancher, 1987; Gould, 1981; for reviews), usually according with some prior hypothesis about the superiority of the "white race" over other groups.

That said, there has also been applied research attempting to show that intelligence is at least, in some measure, modifiable in a positive way. For example, Feuerstein (1980) presented a program called *Instrumental Enrichment* that his data suggested could help improve the intelligence of those who were intellectually challenged by the kinds of tasks found on standardized intelligence tests. Sternberg, Kaufman, and Grigorenko (2008) presented a program, based on research originally done in Venezuela, for helping people improve their intelligence. Jaeggi et al. (2008) showed that at least some aspects of fluid intelligence might be susceptible to positive modification.

These various efforts show that applied research can serve either more or less positive purposes. Applied research is a useful way of putting science into practice, but it can either create electric bulbs that light up the world, or nuclear weapons that potentially can destroy that same world.

# 1.2.7 Biological and Behavioral Methods

There are many methods through which psychological scientists can investigate the psychology of human thought. Two classes of methods are *biological*, which involves studies of the brain and central nervous system, using methods such as functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) to study the brain; and *behavioral*, which typically presents people with problems or questions for them to address. We have discussed behavioral research throughout the

chapter. What does biologically-based research look like?

Some of the earliest biological research emphasized the analysis of hemispheric specialization in the brain. This work goes back to a finding of an obscure country doctor in France, Marc Dax, who in 1836 presented a little-noticed paper to a medical society meeting in Montpelier. Dax had treated a number of patients suffering from loss of speech as a result of brain damage. The condition, known today as aphasia, had been reported even in ancient Greece. Dax noticed that in all of more than 40 patients with aphasia, there had been damage to the left hemisphere of the brain but not the right hemisphere. His results suggested that speech and perhaps verbal intellectual functioning originated in the left hemisphere of the brain.

Perhaps the most well-known figure in the study of hemispheric specialization was Paul Broca. At a meeting of the French Society of Anthropology, Broca claimed that a patient of his who was suffering a loss of speech was shown postmortem to have a lesion in the left frontal lobe of the brain. At the time, no one paid much attention. But Broca soon became associated with a hot controversy over whether functions, particular speech, are indeed localized in the brain. The area that Broca identified as involved in speech is today referred to as Broca's area. By 1864, Broca was convinced that the left hemisphere is critical for speech. Carl Wernicke, a German neurologist of the late nineteenth century, identified languagedeficient patients who could speak but whose speech made no sense. He also traced language ability to the left hemisphere, though to a different precise location, which now is known as Wernicke's area.

Nobel Prize-winning physiologist and psychologist Roger Sperry (1961) later came to suggest that the two hemispheres behave in many respects like separate brains, with the left hemisphere more localized for analytical and verbal processing and the right hemisphere more localized for holistic and imaginal processing. Today it is known that this view was an oversimplification and that the two hemispheres of the brain largely work together (Gazzaniga, Ivry, & Mangun, 2013).

More recently, using positron emission tomography (PET), Richard Haier discovered that people

who perform better on conventional tests of intelligence often show less activation in relevant portions of the brain than do those who do not perform as well (Haier et al., 1992). Presumably, this pattern of results reflects the fact that the better performers find the tasks to be easier and, thus, invoke less effort than do the poorer performers. P-FIT (parietofrontal integration) theory, proposed by Rex Jung and Richard Haier (2007), proposes that general intelligence is associated with communication efficiency between the dorsolateral prefrontal cortex, the parietal lobe, the anterior cingulate cortex, and specific temporal and parietal cortex regions.

Again, it is important to emphasize that biological and behavioral methods are not opposed to each other. In Haier's research, as in most contemporary biologically-based research, participants perform some kind of cognitive task and their behavior is recorded. What is different is that, while they perform the task, biological measurements are made, for example, by an fMRI machine in which the participants are embedded. So even biological research and behavioral research can combine in powerful ways to yield insights about human cognition.

# 1.3 Seven Themes Applied to Problem Solving

We believe that the seven themes are universal issues within a psychology of human thought. We have presented these themes in the context of intelligence but to illustrate the usefulness of these distinctions in another exemplary domain, we choose the field of problem solving (see Chapter 9, "Problem Solving", for more details). We will go through the seven dichotomies and see if they are useful in that domain too.

(1) *Nature – nurture*. This distinction plays not so important a role as it does in the context of intelligence. One reason could be that there are no controlled twin studies comparing problem solving. The dependent variable of interest was always intelligence, not problem solving. Therefore, a lack of research data forestalls conclusions.

(2) *Rationalism – empirism*. As has been said before, rationalists see an advantage in the use of theories, empirists rely more on data. In problem solving research, we need both: a strong theory that makes predictions about behavior, and good experiments that deliver reliable data.

(3) *Structures – processes*. Problem solving is *per definitionem* more relevant to processes than to structures but in fact, most studies using problem solving measures (like those used for the worldwide PISA problem solving assessment of 15-year old students; see Csapó & Funke, 2017) rely on performance evaluation in terms of solution quality. There are not many indicators for processes. With the advent of computer-based assessments of problem solving, log-file analyses have become new data sources for process evaluation (Ramalingam & Adams, 2018).

(4) *Domain-generality – domain-specificity*. This is an important distinction in problem solving research. Heuristics (rules of thumb) are differentiated with respect to their generality: there are general-purpose strategies like means-ends analysis (i.e., considering the obstacles that prevent the direct transformation from an initial problem state to the goal state; formulating subgoals to overcome the obstacles) and there exist domain-specific solution strategies, like finding a bug in a software program that can be used only under certain circumstances.

(5) *Lab studies – ecological validity*. There is a group of researchers in the field (see Lipshitz, Klein, Orasanu, & Salas, 2001; summarizing: Klein, 2008) that uses the label of "naturalistic decision making" (NDM). They claim that NDM relies on (1) the importance of time pressure, uncertainty, ill-defined goals, high personal stakes, and other complexities that characterize decision making in real-world settings; (2) the importance of studying people who have some degree of expertise; (3) the importance of how people size up situations compared to the way they select between courses of action. They criticize lab studies for their missing ecological validity. As it turned out recently, the differences between the two sides seem to be less than thought (Kahneman & Klein, 2009).

(6) *Basic research – applied research*. Most of the current research in problem solving is focused on basic issues. But the field for applications is wide open. Especially with complex problem solving (i.e., complicated ill-defined problems), political and economic problems come into the research focus. For example, Dörner and Güss (2011) did an analysis of Adolf Hitler's decision making style and identified a specific strategy of the dictator for solving political problems.

(7) *Biological methods – behavioral methods*. Recently, there have been some studies conducted with fMRI methods (Anderson, Albert, & Fincham, 2005; Anderson et al., 2008). But the use of biological methods is still lacking in large portions of the research arena of problem solving. One reason for this lack of research is the complexity of higher cognitive processes.

Summarizing, we can say that the application of the seven themes to the field of problem solving research does work. The themes can be found here, too. It is likely that these topics will be found throughout the chapters of our book, some of them more clearly, others of them less so.

# 1.4 Conclusion

Human thought is a fertile field for investigation. Almost all the problems we solve and decisions we make depend on human thought. We have argued that seven themes pervade much of research on human thought. We have used human intelligence and problem solving as examples of how these themes are pervasive.

There is no one "best" method for studying human thought. Rather, one wants to use a variety of *converging operations* (Garner, Hake, & Eriksen, 1956)—different methods that converge upon the same substantive results—to understand human thought. This book will show you the astonishing number of different ways converging operations have been used to help us all learn how we think and use that thought to adapt to and shape the world in which we live.

#### Summary

This chapter introduces the psychology of human thought. It opens by considering what the field encompasses, and at a general level, how investigations of human thought proceed—through theories generating hypotheses leading to experiments for which data can be analyzed. The chapter then considers seven themes that pervade research in the psychology of human thought, giving as an example, research on human intelligence, where all seven themes have permeated research ever since the field began. The seven themes are nature and nurture, rationalism and empiricism, structures and processes, domain generality and domain specificity, validity of causal inferences and ecological validity, basic and applied research, and biological and behavioral methods. The chapter concludes that the psychology of human thought is best investigated through a melding of *converging operations*, that is, by multiple kinds of methods that one hopes will yield mutually confirming results.

#### Review Questions


# References


# Glossary


# Chapter 2

# History of the Field of the Psychology of Human Thought

ROBERT J. STERNBERG

Cornell University

Why should students bother to learn anything at all about the history of the field? On the very day I write this chapter, a younger colleague, an assistant professor, told me she is interested in the future of the field, not its past. Yet, there are three major reasons to study the history of psychology in general, and of the psychology of human thought, in particular.

First, many contemporary ideas can be better understood if we understand their historical context. For example, when trying to understand ideas about whether propensities toward language are inborn or acquired, it helps to understand the history of rationalism and empiricism and how they have influenced this and other debates about human propensities. Indeed, the debate between those who emphasize inborn traits and those who emphasize environmental influences truly cannot be well understood without understanding the nature of rationalism and empiricism. Moreover, current views on gene x environment interaction are a product of a long and, as it happens, largely fruitless debate between those who wanted to understand human behavior as almost entirely genetically programmed (some early behavior geneticists) and those who wanted to understand it as driven almost entirely by experiences in the environment (some early behaviorists).

Second, knowledge of history prevents us from claiming original credit for ideas that are steeped in the history of the field. Put another way, historical knowledge prevents us from "reinventing the wheel." Imagine if society had no knowledge of past inventions, and instead of dreaming up new inventions, kept reinventing the same things, again and again. Science is no different. For science to advance, scientists have to be aware of what ideas have already been proposed.

Third, we need to know which ideas from the past worked well and which worked poorly. Knowledge of the history of a field can prevent us remaking mistakes that others already have made. When one reads the history of the field, one sometimes feels amazement at ideas people once held, such as of the validity of phrenology (studying patterns of bumps on the head) for understanding people's personalities. But if we do not learn from these past mistakes, what is to stop us from making them again?

For example, why bother to read how Jerome Bruner and his colleagues studied concepts and concept learning in 1956 (Bruner, Goodnow, & Austin, 1956)? The idea of studying such simplified concepts was that one could study some kind of "pure" concept formation, unfettered and unimpeded by individual and group differences in prior knowledge. If different shapes, sizes, color names, and so forth were used, everyone would be at the same level of advantage–and disadvantage.

But later studies revealed that things did not work that way. Rosch (1975) found that how people form concepts about concrete items, such as kinds of animal or plant life, bears little resemblance to how people form concepts about abstract items. Moreover, concepts have a "basic level," a level at which we tend to think most easily about them. For example, people find it easier to think in terms of concepts at the level of "bird" than at the higher level of "chordata," even though the latter is a higher level. Understanding the evolution of concept-formation research will help future investigators realize that there may be differences in the way more abstract and more concrete concepts are conceived, so that they do not again make the mistake of thinking that all concepts are processed in the same way. Similarly, there are differences in the way people solve abstract, structured, IQ-test-like problems and more concrete, practical, and unstructured problems such as how to choose a mate (Frensch & Funke, 1995; Sternberg et al., 2000). Thus, one might wish to study problem solving in contexts that resemble the universe of tasks to which one wishes to generalize one's conclusions.

# 2.1 The Dialectical Development of Ideas

Many ideas in psychological science, in general, and in the field of human thought, in particular, proceed in a kind of dialectical progression. The idea of a dialectic was formulated by the philosopher Georg Hegel (1807/1931), who suggested that people think in one way for a while, a *thesis*; then they move on to a contrasting and sometimes contradictory way of seeing things, an *antithesis*; finally, they move on to an integrated view, a *synthesis*, whereby two ideas that had seemed contradictory no longer seem that way, but rather seem as though they can be integrated and understood as both being true, perhaps at different levels.

# 2.2 Early Western Antecedents of the Psychology of Human Thought

Where did the study of human thought begin, and when did it happen? The mythical origins of the psy-

chology of human thought can be traced to a Greek myth of *Psyche*, whose name conveys the idea of a "breath of life," or put another way, the soul, believed once and still by many to inhabit the body during life and then to leave the body upon a person's death. The Greek term *nous* (which once was believed to be a bodily organ responsible for the clear and coherent perception of truth) is an uncommon English word for the mind; *nous* particularly referred to thinking that involved deep reasoning or even reasoning that was divinely inspired. In the ancient Greek world, the body and the mind were viewed as largely distinct. The mind might cause activity in the body, but the mind nevertheless was independent of the activity of the body. This dialectic–of the mind and body as entirely separated or as unitary continues even into the present day.

The origins of the study of the psychology of human thought can be traced to two distinct approaches to the understanding of human behavior: philosophy and physiology. Today, these two fields of inquiry are viewed almost as dialectically opposed. That is, philosophy is often viewed as involving speculative methods and physiology as involving empirical, largely scientific methods. But in ancient Greek times, many physiologists as well as philosophers believed that truth could be reached without the benefit of empirical methods.

As time went on, philosophy and physiology diverged more and more, with physiologists seeking out empirical methods that never interested philosophers. As time went on, several dialectics kept arising and re-arising in the study of the human mind—whether the mind and body are one entity or distinct entities; whether the mind is best understood through rationalistic or empirical methods; whether abilities are genetically or environmentally determined. The synthesis stage of each dialectic involved the recognition that the two positions are not necessarily opposed to each other—the ideas could be integrated. For example, abilities almost certainly have both genetically and environmentally influenced components, as well as a component influenced by the interaction between genes and environment.

Hippocrates, the ancient Greek physician and philosopher (ca. 460–377 B.C.E.) believed in mindbody dualism, or the notion that whereas the body

is composed of physical substance, the mind is not. Hippocrates proposed that the mind resides in the brain. Although today this idea sounds rather obvious, many of his predecessors had different ideas about where the mind resided, ranging from the heart to the gods.

Plato (ca. 428–348 B.C.E), who lived at roughly the same time as Hippocrates, agreed that the mind resided in the body, and in particular, in the brain. In contrast, Aristotle (384–322 B.C.E.) believed that the mind resided in the heart. These two philosophers set up three important dialectics for the psychology of human thought—the relationship between the mind and the body, the use of empirical observations versus philosophical introspections as a means for discovering the truth, and the original source for our ideas.

Plato believed that reality inheres not in the concrete objects that we become aware of through our senses, but rather in abstract forms that these objects somehow represent. That is, the reality of you is not in your physical substance but rather in the abstract ideas you represent. The computer (or other device) on which you are reading this text is not real; rather, the abstract idea behind it is real. In contrast, Aristotle believed, as you probably do, that the reality of yourself is in your concrete substance and that the reality of your computer (or other device) is in that concrete device, not in the idea of it. According to Aristotle, the idea is derivative, rather than primary.

Plato's ideas led to the philosophy of mind-body dualism, whereas Aristotle's ideas led to *monism*, or the idea that the body and mind are of a single kind of reality, existing in a single plane. In this view, the mind is a byproduct of anatomical and physiological activity. It has no separate existence apart from this activity.

These different ideas about the nature of reality led Plato and Aristotle to different methodologies for investigating the nature of human thought. Plato was a rationalist, believing that introspection and related philosophical methods of analysis could and should be used to arrive at truth. After all, what purpose would there be to studying empirically the imperfect copies of reality that concrete object represent? Rather, one would be better off using reflection to understand reality in the realm of abstract ideas.

In contrast, Aristotle was fundamentally an empiricist, believing that the nature of human thought could be best understood through observation and experimentation. We learn about reality by observing concrete objects, including ourselves. Because reality inheres in concrete objects, we learn best about them by studying them empirically.

Further, Plato believed that ideas are largely innate. That is, we are born with virtually all the ideas we have. Experience merely brings them out. In the dialogue *Meno*, Plato claimed to demonstrate (through Socrates, who generally was the main protagonist in the dialogues) that all the ideas about geometry that a slave boy had in his head were there at the boy's birth. Experience merely brought them out. In contrast, Aristotle believed that ideas generally arise through experience.

All of these dialectics–whether the mind and body are one entity or distinct entities; whether the mind is best understood through rationalistic or empirical methods; whether abilities are genetically or environmentally determined—are still active in research today that seeks to understand the human mind. Psychological scientists disagree even today as to the extent to which mind and body are distinct, on the roles of rationalistic and empirical methods, and on the origins of abilities.

# 2.3 Intermediate Periods in the Western History of Understanding Human Thought

During the early Christian era (200–450 C.E.) and the Middle Ages (400–1300 C.E.), rationalism and empiricism became subsidiary to the primacy of religious faith. Neither method was viewed as valid unless it demonstrated what was already "known" to be true on the basis of Christian doctrine. (Other views evolved in Eastern countries, but because modern psychological science is largely based on the Western tradition, that is what will be covered here.) This kind of logic—which is perhaps as prevalent today as in the past, just in different forms—shows the fallacy of confirmation bias, whereby we seek out information that is consistent with what we believe and ignore or reject information that is not

consistent with our beliefs. More and more today, through social media and other means, people only read news feeds and websites that present views that correspond to those the individual already has.

Modern views of science were born during the period of the Renaissance, roughly from the 1300s to the 1600s. The focus of psychological thinking shifted from God to humanity. Strict control of thinking in terms of religious doctrine came under attack. Now empirical observation, often guided by underlying theories, came into vogue as a preferred method for understanding human thought and other human phenomena.

# 2.4 The Early Modern Period (1600s to 1800s)

Interestingly, the Early Modern Period saw a replay of some of the dialectics that distinguished Plato and Aristotle. René Descartes, a philosopher, agreed with Plato's emphasis on rationalism as the best way to seek truth, and Descartes, like Plato, was a dualist. Descartes further believed that knowledge was innate. In contrast, John Locke (1632–1704), also a philosopher, sided largely with Aristotle, believing in the primacy of empirical methods, monism, and the idea that all knowledge is acquired from experience. Locke took this view to an extreme, arguing that, at birth, the mind is a *tabula rasa*, or blank slate. We acquire knowledge through sensory experience, and thus the experiences we provide children are the keys to what they are able to learn in their lives. David Hume, another empiricist philosopher, sided with Locke in the belief that knowledge is acquired. He further pointed out that all our causal inferences are indirect. We see one thing happen, and then quickly and proximally, another, and infer causality. We can never see causation directly occur—we can only come to believe it is true.

Two important successors to Descartes and Locke were the philosophers John Stuart Mill (1806–1873) and Immanuel Kant (1724–1804). Mill saw the mind entirely in mechanistic terms. He believed that the laws of the physical universe could explain everything, including our lives as human beings. His was an extreme form of monism, sometimes referred

to as reductionism, a view that reduces the role of the mind to the status of the physical and chemical processing occurring in the body. Those today who see the mind as nothing more than physiological operations of the brain and its accompanying central nervous system might be viewed as reductionists.

Kant provided syntheses to many of the theses and antitheses that had been proposed before him. He sought to understand how the mind and the body are related, rather than looking at one as subservient to the other. Kant also allowed roles for both *a priori* (rationally determined) and *a posteriori* (empirically determined) knowledge. What is perhaps today most important about Kant's contribution is the recognition that philosophical debates do not have to be "either-or," but rather can be "both-and," seeking roles, for example, both for inborn knowledge and for empirically derived knowledge.

# 2.5 The Modern Period of the Psychology of Human Thought

The modern period of the psychology of human thought can be seen as beginning with structuralism, which sought to understand the structure (configuration of elements) of the mind by analyzing the mind in terms of its constituent components or contents (see Table 2.1 for a comparison between this and other modern schools of thought). At the time structuralism was introduced, scientists in other fields also were trying to understand constituents, such as the periodic table of elements and the biochemical constituents of cells. Thus, structuralism was a part of a large movement in science to break things down into their basic elements.

An important pre-structuralist was the German psychologist Wilhelm Wundt (1932–1902). Wundt argued that the study of cognition should concentrate on immediate and direct experience, not mediate and indirect experience. For example, if a subject looked at a tree, what would be important to Wundt, from a psychological point of view, would not be the identification of the object as a tree or a maple tree, but rather one's seeing a large cylinder with a rough brown surface jutting out into the air with green protrusions (i.e., leaves) attached to smaller cylin-


Table 2.1: Main Schools of Thought in the History of the Psychology of Thought.

drical types of objects (i.e., branches) jutting out from the main cylinder. Wundt suggested that the best way to study immediate experience was through introspection—that is, subjects reporting their direct and immediate experiences. Wundt believed that people could be trained to be experts at introspection, so that they would report exactly what they sensed without the mediation of their knowledge of concepts and categories (such as *tree* or *maple*).

Perhaps the first major structuralist was Edward Titchener (1867-1927), whose views were similar to Wundt's. Although Titchener started out as a strict structuralist, later in his career he branched out and considered other ways of studying human thought. Titchener's change of mindset illustrates an important lesson about scientific creativity: Scientists do not have to get stuck in, or fixated upon, the ideas that characterize their early work. They can "grow on the job," and themselves think dialectically, with their ideas evolving along with their careers.

Structuralism is of interest today primarily in an historical sense, because it was shown to have a number of problems associated with it. First, as time went on, the number of "elementary sensations" it proposed grew too large to be manageable. There seemed to be no limit, and so its role in reducing experience to a manageable number of elementary sensations was lost. Second, to the extent it was useful, it was for understanding simple rather than complex aspects of human behavior, such as problem solving, reasoning, or language. Third, its heavy reliance on introspection came under attack. While introspection might be of some use, it scarcely seemed to be the only method or even a primary method by which

knowledge about thinking could be gained. Moreover, people's introspections, no matter how much the people are trained, are subject to various kinds of biases as a function of their past experiences. Finally, different people had different introspections, so that it was difficult to gain agreement as to just what the basic sensations were.

### 2.6 Functionalism

Functionalism looks at the functional relationships between specific earlier stimuli and subsequent responses; in other words, it asked the question of why people behave the way they do—how do events in a person's life lead the person to behave in certain ways but not others? Thus, functionalists asked a different set of questions from structuralists, concentrating less on *what* people experienced and more on *why* they experienced it.

Again, there is an important lesson to be learned from the evolution of psychological thinking from structuralism to functionalism. That lesson is that different schools of, or approaches to psychological thought, differ at least as much in the questions they ask as in the answers they obtain. When psychological science moves on, it is often not so much that the answers change as that the questions change.

The core beliefs of structuralists—seeking elementary sensations through analyses of introspection—were pretty well defined. The core beliefs of functionalists never cohered quite as well. Indeed, they used a variety of methods to answer their questions about the "why" of human behavior. Sternberg History

# 2.7 Pragmatism

Pragmatism, an outgrowth of functionalism, holds that knowledge is validated by its usefulness. The main question pragmatists are concerned with is that of how knowledge can be used to make some kind of a difference.

One of the most well-known pragmatists was William James (1842–1910), who was not only a psychologist but also a philosopher and a physician. His landmark work was *Principles of Psychology* (James, 1890/1983). It is rare for a scholar to enter the pantheon of "most distinguished psychologists" for just a single work, but James managed to do it with that one major work.

James critiqued structuralism's focus on minute details of experience. He believed instead that psychology needs to focus on bigger ideas. He is particularly well known for his theorizing about consciousness, which he believed was the key to people's adaptation to their environments.

John Dewey (1859–1952) applied pragmatism to a number of different areas of thought, most notably, education. Dewey emphasized the role of motivation in education (e.g., Dewey, 1910). In order to learn effectively, a student needs to see the use of what he or she learns. If the learning is irrelevant to a student's life, the student will have little incentive to process deeply the information that is taught. One way educators can motivate students is by having the students choose their own problems. In that way, the students will choose problems that interest them, whether or not they interest the teachers.

Dewey also believed in the value of applied research. Much of the research being done, he thought, had no obvious use and hence was not likely to make a long-lasting contribution. Pragmatism would argue for applied or at least life-relevant research that could be put to some use, even if not immediately.

Pragmatism remains a school of thought today: One frequently hears politicians argue for educational programs that prepare students for careers and that focus on knowledge that is readily applicable. But the advantages of pragmatism are, in some ways, also its disadvantages. First, it can lead to shortsightedness. Much of the most important applied research of today emanated from the basic research

of yesterday. Second, the school of thought raises the question of "useful to whom"? Is it enough for an education to be useful to just one person? How about if it is useful to one person but useless to another? Finally, pragmatism, in general, can have a limited notion of usefulness. What is useful to a person at one time, in the short run, may not be useful to the person in the long run.

# 2.8 Associationism

Associationism concerns how ideas and events become associated with one another in the mind. Thus, it serves as a basis for a conception of learning—that learning happens through the association of ideas in the mind.

One of the most influential associationists was the German psychologist Hermann Ebbinghaus (1850– 1909), who was the first empirical investigator to apply associationist ideas experimentally. Whereas Wundt was an introspectionist, Ebbinghaus was an experimentalist. To the extent that he used introspection, it was about himself. Ebbinghaus also differed from Wundt in that his main subject was himself.

Edwin Guthrie (1886–1959) expanded upon Ebbinghaus's ideas about associationism, proposing that two observed events (a stimulus and a response) became associated with each other through close occurrence in time (temporal contiguity). In this view, stimulus and response become associated because they repeatedly occur at about the same time, with the response following the stimulus. Guthrie, however, studied animals rather than himself.

Edward Lee Thorndike (1874–1949) developed these ideas still further, suggesting that what is important is not mere temporal contiguity, but rather "satisfaction," or the existence of some reward. According to Thorndike's *law of effect*, a stimulus tends to produce a certain response (effect) over time if an organism is rewarded (satisfaction) for that response.

Associationism in its original form has not survived. The idea that complex behavior could be explained just on the basis of simple associations has never really worked well. None of the associationists ever gave a persuasive account of problem

solving, reasoning, decision making, or any other higher process.

## 2.9 Behaviorism

Behaviorism is the view that psychology should deal only with observable behavior. It is in a sense an extreme form of associationism. It originated as a dialectical reaction against the focus on personally subjective mental states as emphasized both by structuralism and functionalism. Radical behaviorists argue that arguments regarding (internal) thought processes are merely speculations. In their view, although such speculations may have a place in philosophy, they do not have a place in the science of psychology. The behaviorist view was part of a movement called *logical positivism*, according to which the basis of all knowledge is sensory perception.

The father of the radical behaviorist movement was the American psychologist John Watson (1878– 1958). Watson believed that psychology should focus only on observable behavior. Watson worked primarily with rats in his research, although he became famous, or infamous, for an experiment in which he conditioned a young child, "Little Albert," to fear a white rat, a fear that later generalized to other animals, such as a white rabbit (Watson & Rayner, 1920). A successor to Watson, Clark Hull (1884–1952), believed that it would be possible to synthesize the work of theorists like Watson and Guthrie with the work of Pavlov on involuntary conditioning. He constructed elaborate mathematical models to achieve such a synthesis.

A famous successor to Hull was B. F. Skinner (1904–1990), also a radical behaviorist. Skinner believed that all behavior could be understood by organisms emitting responses to environmental contingencies. Skinner applied his ideas about behaviorism to many different kinds of behavior, at first learning, but then also language and problem solving. His views may have had some success in accounting for simple learning but did less well in accounting for complex behavior.

Skinner also proposed that it would be possible to construct a Utopian society based on his ideas about instrumental conditioning (i.e., conditioning in which responses are shaped by rewards and nonrewards of behavior). Because Skinner believed the environment controls behavior, the idea of the Utopia was to create environments that would control behavior so that it would conform to the ideals of the community.

# 2.10 Gestalt Psychology

Gestalt psychology sought to understand behavior in terms of organized, structured wholes; that is, instead of breaking down behavior and its underlying cognition into constituent parts, Gestalt psychology sought to understand behavior holistically. Three of the main psychologists behind the movement, all German, were Max Wertheimer (1880–1943), Kurt Koffka (1886–1941), and Wolfgang Köhler (1887– 1967). The Gestaltists applied their framework to many aspects of psychology, and especially to perception and complex problem solving. For example, they suggested that insight problems, in which one is blocked from any kind of solution until one has an "ah-ha" experience, could be understood in terms of a holistic restructuring of a problem to reach a solution. An example would be the nine-dot problem, in which one has to connect nine dots, arranged in three rows of three, in four straight lines without taking one's pencil off the paper. The "insight" for solving the problem is that one has to go outside the implicit periphery of the nine dots in order to solve the problem.

## 2.11 Cognitivism

The main current paradigm for understanding the psychology of human thought is cognitivism, which is the belief that much of human behavior is comprehensible in terms of how people represent and process information. Cognitivists seek to understand elementary information processes and how they are represented in the mind.

Early cognitivists, such as Miller, Galanter, and Pribram (1960), argued that both behaviorist and Gestalt accounts of higher processes are inadequate. Instead, they suggested that psychologists need to

#### Sternberg History

understand cognitive processes. The unit they proposed was the TOTE (Test-Operate-Test-Exit). The idea behind this unit is that when we need to solve a problem, we first need to test the difference between where we are and where we need to be to reach a solution. We then operate to reduce the difference between our current state and the solution state. Then we test to see if we are done. If not, we operate again. And we keep going until we reach a solution to the problem, at which point we exit.

Two other pioneers in the study of human thought were Newell and Simon (1972), whose book *Human Problem Solving* showed how a relatively small set of elementary information processes could be used to solve problems of a wide variety of kinds. Neisser (1967), in his book *Cognitive Psychology*, suggested a process called analysis-by-synthesis, in

which hypotheses are formulated and compared with data in the environment until one of the hypotheses produces a match to the data. In a later book, *Cognition and Reality* Neisser (1976) emphasized the importance of studying complex human behavior in its natural contexts. Today, cognitivism thrives, but other schools of thought are complementing it. For example, more and more cognitive psychologists are seeking to understand not only the cognitive bases of complex behavior, but also its neuropsychological underpinnings.

# Acknowledgement

Parts of this chapter draw on ideas about the history of the field earlier presented in Chapter 2 of Sternberg and Ben Zeev (2001).

#### Summary

The history of the study of human thought can be understood in terms of a dialectical progression of ideas. Many of these ideas originated with the Greek philosophers, Plato and Aristotle, who, respectively, believed in the importance of rationalist and empirical methods for understanding human thought. Plato's ideas formed the basis for mind-body dualism.

During the Middle Ages, ideas about human thought were seen as deriving from what individuals thought they knew about their relation to God. In the Renaissance, the scientific method began to gain ascendancy.

The rationalist and empiricist schools of thought gained exponents in philosophers René Descartes and John Locke, respectively. Immanuel Kant synthesized many of their ideas, showing that the methods of both rationalism and empiricism could be important in acquiring new knowledge.

In the early modern era, structuralism argued for the importance of decomposing sensations into their most elementary constituents. Functionalism, in contrast, emphasized the "why" of behavior rather than its constituents. An offshoot of functionalism, pragmatism, suggested we look for how knowledge could be used. Associationism argued for the importance of connections between ideas; behaviorism, especially in its radical form, suggested that only observable behavior should be studied by psychologists. Behaviorists were particularly concerned with the role of environmental rewards in behavior. Gestaltists suggested that behavior be studied as wholes, because the whole is more than the sum of its part. Cognitivism, an important school even today, suggests the importance of understanding the mental structures and processes underlying behavior.

#### Review Questions


#### Hot Topic

Robert J. Sternberg

The dialectic plays a role not only across investigators over time but also within a single investigator over time (Sternberg, 2014, 2015). It is important for researchers to look not only at how research has evolved over historical time but also how the researcher's research program has evolved over the course of a career. If the researcher finds no evolution, then he or she perhaps has not been as creative as he or she could have been.

In my own research, I originally proposed an information-processing "alternative" to psychometric approaches to intelligence. At the time, the late 1970s, I saw an approach emphasizing information-processing components as replacing structural psychometric factors. But I later synthesized what had been a thesis and antithesis. Components and factors were compatible, with factors obtained through analysis of variation between people and components obtained through analysis of variation across stimuli. In other words, both

components and factors were valid, but as different partitions of variation in a psychological study. Later this synthesis became a new thesis, as I argued that the approach I had used was too narrow and failed to take into account creative and practical aspects of intelligence, which complemented the analytical aspects dealt with in psychometric and cognitive approaches. I thought that I now had "the answer." But then I came to view the answer as incomplete, because I realized what mattered more than one's particular cognitive or other skills was how one utilized these skills. So I came to argue that "successful intelligence" is the construction of a life path that makes sense in terms of one's own goals and initiatives, by capitalizing on one's strengths and compensating for or correcting one's weaknesses. But later, I came to see even this view as incomplete, because it neglected wisdom, or using one's knowledge and skills to help achieve a common good. And in today's world, I came to believe, what most is missing is not IQ points—there are lots of smart people, including so many people in universities—but rather the use of those "smarts" to help others and the world, not just oneself and one's loved ones.

In sum, the concept of a dialectic applies not only between but also within researchers. People need to realize and appreciate how their own ideas evolve and how, through the course of a career, one becomes not just older, but hopefully, in one's research and life, wiser.

#### References

Sternberg, R. J. (2014). I study what I stink at: Lessons learned from a career in psychology. *Annual Review of Psychology*, *65*(1), 1–16. doi:10.1146/annurev-psych-052913-074851

Sternberg, R. J. (2015). Still searching for the Zipperumpazoo: A reflection after 40 years. *Child Development Perspectives*, *9*(2), 106–110. doi:10.1111/cdep.12113

# References


# Glossary


the question of why people behave the way they do–how do events in a person's life lead the person to behave in certain ways but not others?. 19


# Chapter 3

# Methods for Studying Human Thought

#### ARNDT BRÖDER

University of Mannheim

# 3.1 Introduction

As the other chapters of this book will reveal, the psychology of thinking is a fascinating research field which has discovered a lot of surprising insights into this faculty of the human mind. Overcoming the problems associated with investigating something "invisible" such as thoughts is an interesting philosophical problem and a research topic in itself. This chapter will start with the methodological foundation of cognitive psychology and the question as to why scientists do not just rely on people's reports about their thoughts as data. Then, I will provide an overview of the toolbox of methods that cognitive psychologists have developed for discovering insights into thinking. Most methods will be illustrated by one or two selected examples, but it should be kept in mind that the range of possible applications is much broader. There is no recipe as to how to do research on thinking, so psychologists can still be creative in developing new methods and in freshly combining old ones. This methodological challenge is one further aspect which makes research in cognitive science so intriguing.

Readers who want to recapitulate a few basics on the methods of psychology may want to consult Textbox 3.1 first.

#### Textbox 3.1: A brief primer of basic methods in empirical psychology

Psychological laws or hypotheses typically claim that one independent variable (IV) has some influence on another variable called the dependent variable (DV). For example, it may be claimed that the more "deeply" information is processed, the better it will be remembered later (Craik & Lockhart, 1972). Here, the depth of processing is the IV, whereas memory performance is the DV. Theoretical psychological variables are themselves unobservable, but they may be operationalized by translating them into observable variables which are thought to represent the theoretical ones. For example, a shallow processing of information could entail counting the letters of written words, whereas deep processing is based on analyzing the meaning of the words. Likewise, memory performance may be measured by tallying the words someone can recall in a later test. If the hypothesis (or law) is true and the operationalization is adequate, both variables must show a covariation. Empirical tests of psychological hypotheses therefore assess whether such a predicted correlation exists. In a correlation study, researchers measure or observe both variables of interest and assess their covariation. However, the correlation in such a study does not allow the conclusion that the IV variation *caused* the DV change since they might both be influenced by a third variable. For example, the motivation of a participant might influence both the learning strategy and the memory performance without a direct causal link between these variables. To test *causal* hypotheses, scientists try to run experiments whenever possible. Here, they can actively *manipulate* the IV (for example by instructing participants either to count letters or to find a meaningful associate to words). If participants are randomly assigned to the different experimental conditions (so that there are no systematic differences between them), an observed change in the DV has probably been *caused* by the variation in the IV. Experiments are therefore stricter tests of causal hypotheses than correlation studies.

# 3.2 A Natural Science of the Mind?

How can thoughts be studied scientifically? When reflecting on the natural sciences, we imagine researchers investigating *things* that can be *observed* or even *measured* in objective and precise ways. Thoughts, however, come as beliefs, imaginations, intentions, logical inferences, fantasies, insights, daydreaming, or plans, to name only a few of the many concepts associated with thinking. These immaterial "things" do not have a weight or size or electric charge that can be measured with physical instruments<sup>1</sup> . Furthermore, these thoughts are unobservable for outsiders and hence, they seem to evade an objective description.

Since they considered verbal reports based on socalled introspection (self-observation) as unreliable sources of data, philosophers and even the founder of Experimental Psychology, Wilhelm Wundt (1832- 1920, see Figure 3.1), were convinced that higher cognitive processes like memory and thinking could not be studied with the methods of the natural sciences. Beginning with John B. Watson's (1913) "behaviorist manifesto", all internal psychological processes including thoughts were abandoned from scientific psychology for a few decades because verbal data were considered as subjective and thus not suited for scientific research (see Chapter 2, "History of the Field of the Psychology of Human Thought").

This state of affairs was unfortunate because in his groundbreaking experimental investigations of human memory, the German psychologist Hermann Ebbinghaus (1850–1909) had already shown how higher cognitive processes can be studied objectively without using subjective verbal reports as data. In principle, the methodological idea behind modern cognitive psychology foreshadowed by Ebbinghaus (1885) is simple: although cognitive processes like thoughts or memory traces are by themselves unobservable, they may lead to *observable consequences* in behavior which can be objectively noticed and described by different independent observers. Hence, hypotheses about these hidden or latent processes can be tested by setting up experiments and observations that target these predicted consequences of behavior as objective data. To use an example from memory research as founded by Ebbinghaus (1885), we may postulate that during the learning of new materials, these leave a hypothetical "trace" in memory which may vary in strength. This trace itself is unobservable, but one can show that it is "there", for example, when people are able to reproduce the material in a later memory test or even show faster responses to these stimuli in comparison to control stimuli they had not learned before. The test results (amount of recall or speed of reaction) are indicators of the memory strength, and they can serve as objective data for testing hypotheses about it. In the study of thinking, for example, the number of solved

<sup>1</sup> Most psychologists including myself believe for good reasons that all thoughts have a *material basis* since they strictly depend on processes in the brain. However, a belief or an insight, for example, have a psychological surplus dimension (a *meaning*) that cannot hitherto be reduced to electrical and chemical processes in the brain (some say it never will). The psychology of thinking benefits a lot from knowledge about the brain (see section 4.2.6), but it deals with the *semantics* (meaning) of thoughts in human behavior which is exactly this surplus dimension on top of the physical processes.

test items may be an indicator of a certain facet of intelligence (see Chapter 14, "Intelligence"), or the response to a logical puzzle may indicate whether someone followed the laws of logic or rather an intuitive sense of credibility of the conclusion's content (see Chapter 7, "Deductive Reasoning", belief bias).<sup>2</sup>

Hence, as in other natural sciences, psychologists can test hypotheses about unobservable variables by objectively observing or measuring their behavioral consequences. As the American psychologist Edward C. Tolman (1886-1959) argued, this kind of research strategy (later called methodological behaviorism) allows both (1) to use unobservable theoretical concepts in a scientific manner and (2) to do so without recourse to questionable introspective data. Basically, this view is still the methodological basis of modern cognitive psychology.

# 3.3 Why not just Ask People about their Thoughts?

Reading this introduction, you may wonder why psychologists do things in such a complicated way. Why don't we just ask the people about their thoughts to investigate thinking? They know best, don't they?

In fact, one of the first heated methodological debates in the then young science of Experimental Psychology was between Wilhelm Wundt (1907; 1908) and Karl Bühler (1908) about the value of *introspection* as a means of investigating thinking. Introspection literally means "viewing inside" and was used, for example, by psychologists of the Würzburg School of Psychology to gain insights into thought processes. Confronted with a thinking problem, the test person was asked to observe her own thinking processes and later report them to the researcher. In rare agreement, both Wundt (1907; 1908) and the founder of behaviorism, John B. Watson (1913), criticized the "interrogation method" as unscientific for the following reasons, still accepted by most psychologists today (see Massen & Bredenkamp, 2005; Russo, E. J. Johnson & Stephens, 1989):


With respect to reactivity, Wundt (1908) even doubted that it is logically possible to split one's consciousness into two independent parts, the thinker and the observer. And with respect to subjectivity, Watson (1913) bemoaned that, "There is no longer any guarantee that we all mean the same thing when we use the terms now current in psychology" (p. 163 f.).

In an attempt to vindicate verbal reports, a method less prone to memory error and reactivity called the thinking-aloud method was later championed by Ericsson and Simon (1993). Here, test persons are encouraged to verbalize everything that comes to mind in the thinking process without the instruction to explicitly "observe" their thoughts. These verbal protocols are later analyzed qualitatively, and Ericsson and Moxley (2019) provide extensive practical information on how to set up studies and how to analyze protocol data. However, this method does not solve problems 2, 3, and 5 of the above list, and even reactivity has been demonstrated in some studies (Russo et al., 1989; Schooler, Ohlsson & Brooks, 1993).

<sup>2</sup> This "indirect" measurement of theoretical variables is not unique to psychology, but also commonly used in other natural sciences, for example physics, where the mass of a particle may be inferred from its movement in a magnetic field, or the speed of distant stars by a shift of their spectral lines.

#### Bröder Methods

Figure 3.1: Three important methodological forethinkers of experimental cognitive psychology.

In light of the arguments above, are verbal data therefore worthless for investigating thought processes? This conclusion would be too harsh, especially with respect to thinking-aloud data. These and also classical introspective reports may be worthwhile in helping researchers to *generate* hypotheses about cognitive processes. In order to *test* these hypotheses empirically, however, one has to rely on objective data.

# 3.4 Objective Methods for Investigating Thought Processes

Psychologists have been quite creative in developing empirical methods for testing hypotheses about thought processes. The following section describes various methods. As we will see, although the methods can sometimes be subsumed under joint categories like, for example "response time analysis" (Section 4.2.1), the applications vary considerably depending on the specific task, theory, or hypothesis under scrutiny.

We will start with the simple idea that we can test hypotheses about thoughts by simply looking at the *outcomes* of the process, such as the quality or duration of a problem solution. The second and longest section will illustrate several methods that claim to more closely mirror the *processes* taking place during thinking. Finally, we will add very brief sections about *computer* simulations and *neuroscientific* methods in thinking research.

# 3.4.1 Outcome-based Methods

Observable behaviors like finding a problem solution, choosing an option or accepting a logical conclusion are the *results* of thought processes, but can they reveal information about the unobservable processes themselves? For example, large parts of research on creative problem solving (see Chapter 9, "Problem Solving") are based on a simple dependent variable, namely the percentage of participants who solved a problem, typically a hard-to-solve riddle. Whether this reveals insights into the processes involved depends on how you set up your study to test hypotheses. If you vary an *independent variable* which is believed to change certain thinking processes that either facilitate or impede successful problem solving, differences in solving rates between conditions in your experiment speak directly to your hypothesis at test. Next to simple solution

Figure 3.2: (a) Example of a matchstick puzzle - you are allowed to move only one matchstick to achieve a valid equation with Roman numerals, (b) The nine-dot problem: connect all dots with four straight lines without lifting the pen, (c) The ten-coins problem: turn the triangle upside down by moving only 3 coins.

rates and choices, more sophisticated methods utilizing behavioral outcomes allow conclusions about underlying processes by designing *diagnostic tasks* or even by the *model-based* disentangling of the processes involved. We will illustrate the three methods in turn with selected examples.

*Simple Solution Rates.* This issue has been controversial since Maier's (1931) anecdotal observation that unconscious "hints" can foster a problem solution. In more recent studies using matchstick puzzles (Knoblich & Wartenberg, 1998) or the notorious "nine-dots" and "ten-coins" problems (Hattori, Sloman & Orita, 2013; see Figure 3.2), researchers presented hints to the solution so briefly that they were not consciously registered by the participants. Still, in Hattori et al.'s study, solution rates for the nine-dots and ten-coins problems were tripled and increased fivefold, respectively, as compared to a control condition without these brief hints. On the premise that the hints were truly unconscious,<sup>3</sup> the outcome data therefore reveal a lot about the nature of problem solving processes. By simply registering

success rates as the main dependent variable, numerous facilitating and impeding factors for creative problem solving have been identified (e.g. Bassok & Novick, 2012; Funke, 2003; see Chapter 9, "Problem Solving"). In a similar vein, large parts of reasoning research have used solution rates of logical arguments to investigate the factors which make logical problems easy or difficult (e.g. Johnson-Laird & Byrne, 1991) or to compare the cognitive abilities of different people.

*Diagnostic task selection.* Another example of how pure outcome measures may reveal information about latent processes uses the logic of *diagnostic tasks*, meaning that you choose tasks in a way that different processes or strategies predict different solutions or choices for a set of problems. You can then compare a subject's pattern of actual choices across these tasks with the predictions of the hypothetical strategies you are interested in. The strategy with predictions most "similar" to your actual data is presumably the one the participant used. There are different formal ways of assessing this similarity

<sup>3</sup> Whether this is the case with "subliminal" priming is still a matter of debate. I assume it to be true for the illustrative purpose of the example.

between predictions and data, and conclusions are subject to statistical error, but we will not deal with these complications here. As a general conclusion, it can be stated that pure outcome data may well

provide information on detailed process hypotheses, given that these hypotheses make sufficiently different predictions for a set of tasks. An example of this research strategy is given in Textbox 3.2.<sup>4</sup>

# Textbox 3.2: Which strategies do people use in memory-based inferences?

Bröder and Schiffer (2003) were interested in which strategies people use when they have to make decisions from memory. In their task, particpants had to compare different suspects in a hypothetical murder case and choose the one most likely to be the perpetrator. At the beginning of the experiment, participants had learned facts about the 10 suspects by heart (e.g., their blood type, their preferred cigarette and perfume brands, their vehicle). Later, they had received information about the evidence found at the crime scene. Based on the literature on decision strategies, the authors had identified 4 plausible strategies: the heuristic named *Take-the-best* (TTB; Gigerenzer & Goldstein, 1996) will look up the most important piece of evidence and base its decision on this evidence if it discriminates, otherwise, it will use the next most important evidence and so on. A *weighted additive rule* (WADD), in contrast, will look up all information and weigh it according to its importance. A *tallying rule* (TALLY) will compare the suspects simply on the number of matching pieces of evidence. Finally, participants might simply *guess*. The table shows three different task types, with the importance of the evidence decreasing from top to bottom:


Across the three item types, TTB would predict the choices of Suspect 1, Suspect 3, and Suspect 5, whereas a partcipant using WADD would choose Suspects 2, 3, and 5. Someone relying on a pure tallying strategy would select Suspects 2 and 3, but be indifferent (guess with equal probability) between Suspect 5 and 6. Finally, pure guessers would select all suspects in equal proportions. Based on a few assumptions (see Bröder, 2010, for details), the probability of an empirical data pattern can be assessed for each hypothetical strategy, and the strategy with the highest probability of the observed data is diagnosed as the participant's strategy. Bröder and Schiffer (2003, Experiment 1) found a surprisingly high percentage (64%) of participants presumably using a simple TTB heuristic, and a later analysis of response times by Bröder and Gaissmaier (2007) fitted well with this interpretation (see Textbox 3.3).

<sup>4</sup> The more the predictions of various strategies differ, the firmer your conclusion about underlying strategies. A method for maximizing the diagnosticity of tasks is described in Jekel, Fiedler, and Glöckner (2011).

*Model-based measurement of processes.* Finally, detailed information about cognitive processes can be achieved by *measurement models* that formalize assumptions as to how latent processes interact to produce the behavioral outcomes. The processes are represented as *parameters* in a set of equations, and the values of these parameters are estimated from the observed data. This sounds quite abstract, so we provide an example depicted in Figure 3.3. This model formulated by Klauer, Musch, and Naumer (2000) was developed to investigate *belief bias* in syllogistic reasoning (see Chapter 7, "Deductive Reasoning"). Belief bias describes the phenomenon that people tend to accept plausible conclusions more readily than implausible ones, irrespective of the logical validity of the argument. For example, the syllogism "All vegetarians are peaceable. X is a vegetarian. Therefeore, X is peaceable" is a logically valid argument since the conclusion follows from the two premises. However, if "X" is replaced by "Mahatma Gandhi", people are more ready to accept the argument as valid than if X is replaced with "Adolf Hitler".<sup>5</sup> Klauer et al. (2000) formulated a processing tree model depicted in Figure 3.3 which decomposes participants judgments ("valid" vs "invalid") of four different types of syllogisms (valid and invalid arguments with plausible vs. implausible conclusion statements) into logical processes and biased guessing. Logical processes are represented by the *r* parameters, and guessing based on plausibility by the *a* parameters. Given certain assumptions and experimental procedures, the parameters can be estimated from the data, and they allow for diagnosing whether experimentally manipulated variables like time pressure, working memory load, the percentage of valid syllogisms in the task etc. affect logical abilities (reflected in *r*) or rather the readiness to accept conclusions irrespective of the logical validity (reflected in *a*).

Such measurement models have been developed for various tasks in cognitive psychology, including memory, perception, decision making, and logical thinking (see Batchelder & Riefer, 1999, and Erdfelder et al., 2009; for comprehensive overviews). If a measurement model has been validated in thorough experimental tests, it allows the drawing of very detailed conclusions about the underlying processes of observed behavior.

*Evaluation:* As we have seen, focusing on the outcomes of thought processes as objective data may yield much more evidence about the underlying processes than is evident at first glance. In the case of simple success rates as a dependent variable, an obvious advantage is that these are objectively measurable and do not require complex assumptions about their validity as measures. Diagnostic task selection and model-based disentanglement of processes need more assumptions (which should ideally be validated in systematic studies), but this comes with the payoff of sometimes quite detailed information about the underlying processes. As we will see in the next section, additional process measures can often enrich the data by adding valuable information.

## 3.4.2 Process-oriented Methods

As Schulte-Mecklenbeck et al. (2017, p. 446) have argued, "process models deserve process data". Since cognitive theories try to describe the processes that go on in our minds while thinking, it would be worthwhile eliciting data which more directly reflect these processes instead of just focusing on their results. Also, pure outcome data are often not diagnostic enough to differentiate between different theoretical models which may make the same predictions for many tasks (see, for example, item type 2 in Textbox 3.2, for which both TTB and WADD predict the same choice).<sup>6</sup> Although there is no con-

<sup>5</sup> Both historical persons were vegetarians. Hence, there is obviously something wrong with the first premise, but the *conclusion* has to follow from the premises *if* they were true.

<sup>6</sup> Some authors enthusiastic about process data evoke the impression that process data would be *necessary* to test process models in a sensible manner. As the preceding section 4.1 has shown, this is not the case, and I have argued elsewhere that outcome data are sufficient if they are diagnostic and formally linked to the process models under scrutiny (Bröder, 2000). I admit, however, that process data often increase the diagnosticity of the data and are therefore quite useful for research.

#### Bröder Methods

Figure 3.3: Multinomial processing tree model by Klauer et al. (2000) to assess logical reasoning and biased guessing in syllogisms. Each tree depicts processes for all the combinations of invalid vs. valid syllogisms with believable vs. unbelievable conclusions. Parameters *r* reflect reasoning, parameters *a* reflect biased guessing. ©American Psychological Association. Reprinted with permission.

sensus, yet, as to what a cognitive process actually *is* (see Newen, 2017), a defining feature of any kind of process is that it evolves over time. Hence, we will start with this most general property of cognitive processes, reflected in response time data.

#### 3.4.2.1 Response Time Analysis

Response times are a major workhorse of cognitive psychology. They are useful for estimating the duration of component processes, or they can be analyzed as data to estimate cognitive parameters in decision models. Finally, they can be used to test cognitive theories.

*Measuring the duration of cognitive processes.* The first scientist to measure the duration of a simple cognitive process was presumably Frans C. Donders (1868) at the University of Utrecht in the Netherlands. We may smile today at his experimental setup, but in fact, this was a scientifc revolution because it pulled the actions of the mind into the

realm of measurable natural science. He invented what later became known as the *subtraction method*: for example, using the regular oscillations of a tuning fork, he measured the simple reaction time of his colleague repeating a syllable like "ki" when the hearer knew in advance which syllable he would hear. In a second set of trials, the test person did *not* know in advance whether he had to repeat "ki", "ku", "ke", or "ko". Repeating the stimulus without knowledge took on average 46 ms (milliseconds) longer. Donders concluded that the difference was just the time needed to *choose* between the potential responses which was the only additional cognitive process needed in the second task. Shortly after this revolutionary invention, reaction time measurement for the analysis of simple processes became a fashionable method in the newly established psychological laboratories which also triggered technical developments for precise time measurement like Hipp's chronoscope (see Figure 3.4). Although the subtraction method is preferably applied to percep-

Figure 3.4: An early experimental setup (c. 1900) for the precise measurement of verbal reaction times. The memory apparatus on the left displays a stimulus and starts the chronoscope (middle), the verbal reaction is recorded by the voicekey on the right which closes a circuit and stops the chronoscope (taken from Schulze, 1909).

tual tasks, there have been fruitful applications to processes of language understanding as well (Clark & Chase, 1972; 1974), showing that processes of sentence transformation and encoding a negation need certain amounts of time. Hence, the logic of the subtraction method in general is to contrast variants of speeded tasks that include or exclude specific component processes (such as negating a statement) and to generate a set of additive equations in order to estimate the durations of the component processes by simple difference calculations.

A severe limitation of the method is obviously to find tasks which can be designed to differ in only one process. To relax this requirement, S. Sternberg (1969) proposed the widely used *additive factors method* which can do without this specific task construction and merely requires a decomposition of a task into processing stages that can be selectively influenced by experimental factors.

*Estimating parameters in cognitive models with reaction times.* Sometimes, the researcher is not interested in the duration of processes per se, but reaction times are used as indicators for other aspects of cognition, such as ability or motivation. Particu-

larly in research on decision making, various models have been developed that assume a process of *evidence accumulation* before a decision is made. For example, if I want to decide which of two bicycles to buy, I might sample evidence in favor or disfavor of each alternative (such as price, color, number of gears, weight etc.) until a certain subjective threshold of confidence favoring one option over the other is reached. Decision situations like these might be explained by accumulation models, like the *drift diffusion model* (DDF, Ratcliff, 1978) for simple perceptual and recognition decisions or the *decision field theory* DFT, (Busemeyer & Townsend, 1993) for more complex decisions (which would apply to the bicycle example). Figure 3.5 depicts the DDF, but the general idea is similar in other models as well. Donkin and Brown (2018) discuss variants of accumulation models, their similarities, and their differences.

These models were initially developed to explain the *speed-accuracy tradeoff* : in many tasks, people can sacrifice accuracy for higher speed, or they are slower and more accurate which depends both on their ability and their motivation to be accurate.

Figure 3.5: The drift diffusion model. When a stimulus with moving dots is presented, the person starts to sample perceptual evidence for the options "right" vs. "left" until a subjective evidence threshold is met. The drift rate *v*, and the distance of thresholds *a*, and the starting point *z* both determine the accuracy and the duration of the process.

Hence, only looking at error (or solution) rates or response times tells only half of the story. Suppose you have to decide in a perceptual task whether the majority of dots in a display with many randomly moving dots is moving to the right or to the left. According to the DDF, you start sampling perceptual evidence which, from time to time, may speak for one or the other direction, but on average, it will favor one of the decision options and approach the respective subjective threshold. The average speed of this accumulation process approaching one side is called the *drift rate v*, and it reflects the ease of the task (if you compare tasks) or the ability of the decision maker (if you compare people). The accuracy and the overall duration of the sampling process both depend on the *distance a* between the two subjective thresholds which is under the control of the participant who establishes a compromise between desired accuracy and speed. Furthermore, there may be a *bias z* favoring one of the answers (e.g. a tendency to respond "right" in the moving dots task),

reflected in the starting point of the sampling process (an unbiased starting point is *z = a/2*, halfway between the boundaries). Although the mathematical concepts are quite complicated, various computer programs exist to estimate the parameters *v*, *a*, and *z* from empirical response time distributions associated with correct answers and errors. It has been shown in validation studies for various tasks that the parameters *v*, *z*, and *a* indeed primarily reflect task ease (or ability), bias, and motivation to be accurate, respectively (Arnold, Bröder, & Bayen, 2015; Voss, Rothermund, & Voss, 2004). The model has been successfully applied to various domains of cognitive research (Ratcliff & Smith, 2015).

*Testing and validating cognitive models which make response time predictions.* Finally, response time data are critical whenever a cognitive model explicitly or implicitly predicts certain response time patterns. The feature comparison model of categorization by Smith, Shoben, and Rips (1974) is a prominent example (see Figure 3.6). The model

Figure 3.6: A simplified representation of the feature comparison model of categorization by Smith et al. (1974). If the object is sufficiently similar or dissimilar to the category, Stage 1 suffices for a decision. Medium similarity, however, invokes Stage 2 and hence, requires more time.

assumes that in order to categorize a stimulus, its various features are compared with the typical or *characteristic* features of the category. Hence, in deciding whether a robin is a bird, you may quickly find the answer because the characteristic features of birds in general and a robin in particular show a large overlap (can fly, has feathers and a beak, lays eggs, builds nests).

However, when asked whether a penguin is a bird, the feature overlap is smaller (since penguins do not fly and do not necessarily build nests), and the model predicts that you focus on the *defining* features in a second step (e.g. has feathers and a beak, lays eggs), excluding the merely typical (but not necessary) features. This second comparison process consumes additional time, and hence, positive instances of a category should be categorized faster the more characteristic features they share with the category (because this makes the second step unnecessary). *Negative* instances, however, should be correctly classified faster the *fewer* characteristics they share with the concept (e.g. "a whale is a bird" is denied quicker than "a bat is a bird"). These quite complex predictions have been observed, thus corroborating the feature comparison model (Rips et al., 1973).<sup>7</sup> A second example of how response time data have been used to validate cognitive models is described in Textbox 3.3.

*Evaluation:* A precise cognitive theory or model should ideally make predictions about the (relative) duration of processes or tasks. Hence, as the above examples have shown, response times can yield valuable information to test theories. Some early approaches to measure process durations like Donders' (1868) and S. Sternberg's (1966; 1969) methods rely on strict seriality assumptions which are sometimes questioned and hard to justify since processes may operate in parallel (e.g. Ellis & Humphreys, 1999). In addition, the subtraction method often makes unrealistic demands for task construction. As the paradigm case of the DDM has shown, response times may also be a good indicator of ability, task ease, bias, and motivation if analyzed in the context of a model (see Donkin & Brown, 2018). Currently, promising general approaches are being developed that combine outcome-based measurement models (see Section 4.1) with response time data (Heck and Erdfelder, 2016), and more general approaches try to

<sup>7</sup> Corroborating a theory does not "verify" it. There may be even better theories that can explain the same data and make new predictions beyond the corroborated model.

tackle the question as to whether processes operate in parallel and whether they are self-terminating or exhaustive. Finally, for many applications in logical reasoning and problem solving, response times are simply a good indicator of task difficulty in addition to solution rates. Since they are easy to obtain in computerized experiments, this additional source of information should always be recorded.

# Textbox 3.3: Validating outcome-based strategy classification with response time data

In Textbox 3.2, we described how Bröder and Schiffer (2003) classified people as using the decision strategies TTB, WADD, TALLY or GUESS based on the decision outcomes in a set of diagnostic tasks. Bröder and Gaissmaier (2007) reasoned that if the classification really reflected the processes assumed by the strategies, one should expect a specific response time pattern for each group classified as using this strategy. Specifically, when people use TTB, they should need more time the more cues they have to retrieve from memory. Remember that TTB searches cues in the order of decreasing validity and stops search as soon as a discriminating cue is found. Hence, for TTB, we expect increasing response times with the position of the most valid discriminating cue. Since WADD and TALLY retrieve all four cues anyway, they should not show such an increase in response times, at least a much smaller one. WADD should generally take more time than TALLY since it also weighs the cue information with validity which TALLY does not require. Finally, GUESSing should be quickest altogether not showing systematic variations with cue position. As Figure 3.7 shows, the predictions were largely confirmed. Hence, the response time analysis lent additional credibility to the classification procedure that was initially based on decision outcomes alone.

Reprinted with permission.

#### 3.4.2.2 Monitoring Solution Steps and Information Search

With the rise of information processing models of thinking, problem solving research shifted to a type of sequential tasks that allowed the researcher to monitor directly the intermediate steps participants took to solve the problem. A famous example is the "Tower of Hanoi" problem in which three (or more) discs of different sizes are stacked on one of three pegs. The person's task is to move the discs to the third peg according to two rules: first, never put a larger disc on top of a smaller one, and second, only move one disc at a time (see Chapter 9, "Problem Solving", Figure 9.5). A second famous example is the "hobbits-and-orcs" problem where a boat with only two seats can be used to transfer 3 hobbits and 3 orcs across a river following the rule that there must not be more orcs than hobbits on any side of the river at any time. Participants' solution steps can be filmed, protocoled, or assessed by accompanying think-aloud protocols. These kinds of tasks

Whereas this research strategy using sequential tasks with "observable steps" has proven fruitful, it is very restricted in scope. A somewhat more generally applicable approach is to monitor the *information search* prior to a problem solution or decision. In this paradigm, decision-relevant information is hidden from the subject's view and has to be actively uncovered or asked for. We will illustrate both a structured version in an *information board* and an unstructured *open questioning* paradigm.

*Information search board.* The first applications of this method actually used information cards hidden in envelopes and laid out on a table or pinned to a board (e.g. Payne, 1976). With the advent of computerized experimenting, a so-called "Mouse-Lab" version was first published by Payne et al. (1988) which presents information boxes on a screen that can be uncovered by just clicking it with the computer mouse. This methodology is often used


Figure 3.7: Example of a hypothetical MouseLab layout similar to the one used in the study by Schulte-Mecklenbeck et al. (2013) where participants could choose from two different meal options. All cells of the table were closed, and participants could acquire information by clicking on the cells. They are opened here only for illustration.

to investigate multi-attribute decisions, and it has been developed in the meantime also for use in Webbased studies (e.g., Willemsen & Johnson, 2019).

Figure 3.8 shows a typical display from Schulte-Mecklenbeck et al. (2013) in which the decision *options* are arranged in columns, whereas the *attributes* are arranged in rows. In this study, the participant had to choose between meals offered in a virtual canteen, each of which was described by the same set of attributes (price, calories, different nutrients). You may be familiar with these kinds of matrices from consumer reports, for example, in which several products are compared on various attributes. In an information board study, all information is initially hidden, and the decision maker can uncover information she desires (sometimes incurring some search costs) and finally make a decision. The information may remain visible after clicking, or it may disappear again if the cursor leaves the respective box. The latter procedure more heavily taxes working memory. As you can imagine, this procedure yields a wealth of information about the search, such as the search *sequence*, the *amount* of information searched, and the *time* spent inspecting each piece of information. Payne et al. (1988) have collected various measures derivable from these data that are believed to reflect aspects of the decision strategy (see Textbox 3.4), in particular if decision making tends to ignore information and focuses on comparing options on important attributes ("noncompensatory" decision making) or whether the strategy tends to use all information and compares overall evaluations of the options ("compensatory" strategies). Willemsen and Johnson (2019) report new developments to visualize aspects of the search process in this paradigm.

*Unstructured open questioning formats.* The information board technique described in the previous section contains pre-structured information which may create some experimental demands in suggesting which kinds of information the experimenter deems relevant. This allows the inferring of the *relative* importance people put on attributes but not

whether they find them important in the first place. Huber, Wider, and Huber (1997) therefore developed a technique with quasi-realistic decision scenarios. After reading the scenarios (e.g. about the problem of saving an endangered turtle species), participants could ask for any further information they wished, receiving answers from a large set of predefined information. This procedure has shown repeatedly that participants tend to ignore probability information (Huber et al., 1997) and that they ask for information on how to eliminate risks (Huber, Bär & Huber, 2009).

*Evaluation:* Observing the steps involved in thinking by monitoring corresponding behavior is one possibility to more "closely" follow thinking processes. Monitoring stepwise problem solving is restricted to a very specific type of tasks, however. Another possibility is to register the information search processes prior to a decision or action, for example via MouseLab. As we have seen, this can yield a wealth of data that may inform us about the strategies people use. As a caveat, it should be noted that information *search* is not necessarily indicative of how the information is *integrated* (see Bröder, 2000), both may be quite different processes governed by different rules (Glöckner & Betsch, 2008). For example, one may look up all relevant information (seemingly indicating compensatory decision making), but decide to ignore most of it (leading to noncompensatory integration). Or one can decide in a compensatory manner without exhaustive search (if the remaining information could not reverse a decision anyway). Researchers do not always distinguish between search and integration, which may lead to misunderstandings in theory testing (Lohse & E. J. Johnson, 1996). Hence, to apply the methodology, it must be clear which part of cognition is under scrutiny. Finally, the active information search paradigm by Huber et al. (1997) has the advantage of not suggesting experimental demands to the study participants but it is a rather explorative method for generating instead of testing cognitive theories.

### Textbox 3.4: MouseLab Decision Strategy Indicators

Payne, Bettman, and E. J. Johnson (1988) and Payne (1976) derived various measures from the search sequences and inspection times of information in MouseLab, for example the *strategy index* SI (sometimes also called search index or PATTERN) which codes the relative amount of option-wise search (i.e. moving within options to new attributes) versus attribute-wise search (comparing different options on the same attribute). Option-wise search is thought to indicate so-called *compensatory strategies* that use all information and compare overall evaluations of the options (examples are WADD or TALLY in the previous textboxes), whereas attribute-wise search is believed to reflect *noncompensatory strategies* that ignore information (such as TTB in the previous textboxes). If n<sup>o</sup> is the number of search transitions within an option to a different attribute and n<sup>a</sup> the number of transitions within an attribute to another option (transitions switching both option and attribute are ignored), the search index can be computed as

$$SI = \frac{n\_o - n\_a}{n\_o + n\_a},$$

and it varies from -1 to +1, reflecting pure attribute- and option-wise search respectively. Böckenholt and Hynan (1994) proposed a modified version of the index for asymmetric options x attributes tables as in Figure 3.8. The following table contains further measures and their interpretation.


#### 3.4.2.3 Tracking of Eye Movements

A method which has gained popularity in recent years involves the registration of eye movements while thinking, based on the assumption that a person's momentary attention and focus of processing is reflected by his or her fixation on a stimulus. While early eye tracking devices were expensive and intrusive by requiring people to have their head fixated (for example by biting a board) or to wear heavy helmets with cameras and contact lenses, new (and cheaper) devices allow for the remote monitoring of eye movements by use of infrared light reflected from the cornea, either in front of a computer screen or even in more natural environments

(see Ball, 2014, and Russo, 2019, for brief introductions). Eye-tracking has been used extensively in research on reading and language comprehension, but it is also becoming increasingly popular in decision research and research on thinking (see Orquin & Loose, 2013). For example, by using an open information board, tracking the gaze sequence may yield similar information as with a MouseLab procedure.

The motor activity of the eyes is composed mainly of *saccades*, which are quick movements during which no information is registered, and *fixations* which are brief resting periods during which the viewer registers visual stimulus information (e.g., Holmqvist et al., 2011). Consequently, the sequence,

number, average duration and cumulative duration of fixations are of main interest to researchers.

For *explorative* (hypothesis-generating) research, several methods for visualizing the gaze behavior of participants exist. *Heatmaps* color-code the frequency of fixations to certain parts of the stimulus, and *scanpaths* contain additional information about the sequence and the duration of the fixations (see Figure 3.9 for examples of the same data presented as a heatmap or a scanpath). These visualizations are often used in applied research settings like usability and consumer research in order to optimize displays and ads.

In hypothesis-*testing* research, the stimulus display is typically arranged in a way that important parts are clearly separated into areas of interest (AOI) that contain different aspects of the problem. For example, Figure 3.10 (left) shows a display with five letters, four of which build an anagram (= scrambled word puzzle) with a four-letter solution, the fifth letter being a superfluous distractor. The letters are widely distributed across the screen for an errorfree detection of the stimulus a person is looking at a specific moment.

Often, processing hypotheses can be formulated in a way that different problem aspects are expected to receive more attention than others which can be tested by comparing the number or duration of fixations at the respective AOIs. I will describe a research example from problem solving research. To test whether people acquire solution knowledge even *before* they have a conscious insight into the correct solution, Ellis, Glaholt, and Reingold (2011) used anagram problems like the one depicted in Figure 3.10 and monitored eye movements during problem solution. The anagrams consisted of five letters, one of which was not part of the four-letter solution word. Participants were instructed to press a button as soon as they had found the solution word, and in Experiment 1b additionally stated whether the solution "popped up" in a sudden "aha" experience. Ellis et al. (2011) tested the hypothesis that participants would accumulate knowledge prior to finding the solution even if the solution appeared suddenly in their consciousness. This should be reflected in

Figure 3.9: Heatmap and scanpath representation of the same eye tracking data of a person in a decision trial. In this task, the options (columns) were card players, and participants had to predict their success based on advice of experts (rows). In this trial, the participant focuses on the two leftmost options in a predominantly option-wise manner. (Data from Ettlin & Bröder, 2015, Experiment 4).

decreasing attention to the distractor relative to the solution letters. In fact, there was a significant tendency to ignore the distractor letter on average 2.5 s before particpants announced they had found the solution, confirming the hypothesis of knowledge accumulation before conscious insight.

*Evaluation:* The tracking of eye movements has become cheaper, more user-friendly and less intrusive in recent years. Holmqvist et al. (2011) give an extensive overview of theory and application. As we have seen, eye-tracking data can reveal a lot about the sequence of processing and the allocation of attention while thinking, and it can be used both in an explorative and a hypothesis-testing fashion. The latter requires experimental setups with theoretically defined AOIs for which gaze durations and frequencies can be compared. Furthermore, important extensions are under development such as the *memory indexing* method developed by Renkewitz and Jahn (2012). This ingenious idea is based on the "looking-at-nothing" effect first investigated by Richardson and Spivey (2000), demonstrating that during memory retrieval, people tend to look at the location (on a computer screen, for instance) where they learned that information. Basically, this method therefore allows the monitoring of sequences of hidden memory processes by analyzing gaze data! A study by Scholz, Krems, and Jahn (2017) on (hy-

Figure 3.10: Top left: Anagram setup of Ellis et al. (2011), one letter does not belong to the four-letter solution word. Top right: Areas of interest (AOIs) from which fixations to the letters are recorded (not visible to participants). Bottom: Mean proportion of time looked at solution letters and distractor prior to solution in Experiment 1b. ©Elsevier. Reprinted with permission.

pothetical) medical diagnoses not only replicated the looking-at-nothing effect but also showed that the gaze behavior reflects the diagnosis currently most active in working memory, and it also allows the prediction of participants' final decisions. Also, new software methods allow to change displays contingent on gaze behavior "on the fly" (e.g. Franco-Watkins & J. G. Johnson, 2011), thus opening new possibilities for experiments.

There are a few downsides to the eye-tracking method, however: first, the connection between visual attention and gaze direction is not always as close as assumed since spatial attention can also be directed to locations without moving the eyes. Second, many other factors (like salience or reading routines) influence our gaze behavior, thus data are often quite noisy, and it is not always easy to separate meaningful data from the unsystematic variation. Third, depending on the quality of the equipment used, often several participants have to be excluded (e.g. those wearing glasses or contact lenses). Finally, at the moment of writing, explorative rather than theory-testing applications seem to prevail in the literature which may of course change in the future.

#### 3.4.2.4 Response Dynamics

A recent development pioneered by Spivey, Grosjean, and Knoblich (2005) uses the characteristics of the motor behavior (specifically, participants' hand movements) during a decision response to draw conclusions about internal thinking processes and their dynamics. Since most experiments use the computer mouse as the input device, this methodology has been christened *mouse-tracking*, although other devices have been used to record participants' hand movements as well (e.g., the Nintendo Wii Remote, a handle, or motion capture systems). One assumption is that the decision dynamically evolves during the mouse movement, and its trajectory may therefore reflect the extent to which a decision conflict is present (Stillerman & Freeman, 2019). In a typical setup, each trial presents two choice options in the upper left and right corners of the computer screen. The participant has to initiate a trial by clicking on a start button that is typically placed in the neutral mid-

dle at the lower end of the screen (cf. Figure 3.11) upon which the decision-critical information is presented (either immediately, after a delay, or following an initial upwards movement; see Scherbaum & Kieslich, 2018, for a discussion about the different starting procedures and their consequences for mouse-tracking data). During the (sometimes speeded) response, the participant will then choose one option by clicking it while her mouse movements are continuously recorded. If the decision maker feels a conflict between both options, the mouse path will probably not be totally straight, but it will be "drawn" a bit to the competing alternative. Several measures can be derived to quantify this deviation, the simplest is the "maximum absolute deviation (MAD)" of the curved trajectory from the straight line leading to the chosen option. Figure 3.11 shows a typical display investigating the "Simon effect" along with visualized raw data as well as average trajectories from data published by Scherbaum et al (2010).

Although it is quite new, the method has been applied to a variety of domains, such as categorization tasks (animals, gender, race), spoken word recognition, risky decision making, word and sentence comprehension, truth judgments, social cognition and more (see Freeman et al., 2011; Freeman, 2018). It provides a sensitive measure of conflict between response options. Furthermore, the exact analysis of the temporal dynamics in the trajectories (including speed and acceleration metrics) can even provide information about *when* the conflict arises, which can signify whether a specific piece of information is processed earlier or later in the decision process (Dshemuchadse, Scherbaum, & Goschke, 2013; Sullivan, Hutcherson, Harris, & Rangel, 2015). For example, Sullivan et al. (2015) had their participants choose between food items they had rated before on healthiness and taste. Independent of which food was chosen in a trial, the mouse trajectory was influenced by the taste difference earlier than by the healthiness information, indicating that the initial preference tendency is driven by pleasure, whereas health considerations come into play somewhat later in the decision process.

*Evaluation:* The way in which participants move the mouse to choose an option is an unobtrusive

Figure 3.11: Top left: Exemplary mouse-tracking setup of Experiment 2 by Scherbaum et al. (2010) to investigate the Simon effect. Participants had to click a start button at the bottom center of the screen (dashed lines), when moving the cursor upwards, a number x appeared, and participants had to click left or right for x<5 and x>5 respectively. The presentation side of the number varied, creating congruent (x<5 left or x>5 right) vs. incongruent (x<5 right or x>5 left) trials. Top right: The summary mean absolute deviation of mouse trajectories demonstrates the Simon effect with greater average deviation for incongruent trials. Bottom: Individual and average (thick lines) mouse trajectories for congruent and incongruent trials (note that all trajectories were flipped to the left and only correct trials were analyzed).

method for revealing conflicting response tendencies. As the food choice example shows, even quite detailed information about the time course of processing can be gathered. Furthermore, easy-to-use implementation and analysis software has been developed, for example, the mousetrap plugin for creating mouse-tracking experiments in the free and open-source graphical experiment builder OpenSesame (Kieslich & Henninger, 2017) and the mousetrap R package for analyzing and visualizing mouse-

tracking data (Wulff, Haslbeck, Kieslich, Henninger, & Schulte-Mecklenbeck, 2019). As a relatively novel method, mouse-tracking faces a number of challenges. Many aspects of the design of mousetracking studies (e.g. the starting procedure and mouse sensitivity settings) require careful consideration to reduce the amount of noise in the data and to ensure that the decision process takes place during (and not before) the movement (e.g. Scherbaum & Kieslich, 2018). Also, averaged trajectories may be

misleading and suggest a smooth curve when in fact, they are averaged across different types of trajectories in different trials (Wulff et al., 2019). Finally, it is currently unknown whether cognitive conflicts always influence response dynamics and therefore how to interpret the *absence* of trajectory effects.

# 3.4.3 Computer Simulations

Beginning with Newell, Shaw, and Simon's (1958) work on a computer program later called the "General Problem Solver" (although it was rather limited in its abilities), cognitive scientists have attempted to formulate their theories in precise formal terms and to translate them into computer programs. The aim is to *simulate* human performance in cognitive tasks, including typical errors and fallacies or shortcomings in memory etc. Computer versions of theories are also termed computational models (Farrell & Lewandowsky, 2018). The scope of such models ranges from very specific theories about certain tasks to broad overarching "cognitive architectures" (e.g. ACT-R by Anderson et al., 2004) that entail many empirically informed constraints for modeling and predicting human behavior.

The advantages of formalizing theories and cognitive processes in such a way are manifold: first, the precision of the theory typically has to be increased. Whereas verbal theories are often quite vague, an implementation in the computer demands precise concepts. Second, such a formalization may reveal inconsistencies in the theory that would have gone unnoticed without formalizing it. Third, in addition to just predicting qualitative "effects" (e.g. the existence of group differences), precise models may even give quantitative predictions about effect sizes. Hence, in addition to the experimental tools researchers use to observe people's behavior, matching it with computer simulations can reveal a lot about the validity of cognitive theories. We refer the interested reader to Farrell and Lewandowsky (2018) for an excellent introduction to cognitive modeling.

# 3.4.4 Neuroscientific Methods

Since all our cognitive functions including thinking depend on *brain functions*, an ultimate understanding of cognition will have to include knowledge about these functions. The traditional approach of neuropsychology gains many insights into the localization of cognitive functions in the human cortex by carefully assessing cognitive impairments caused by specific brain injuries. These investigations have inspired the view that the brain's architecture is largely *modular* with certain modules being responsible for specific abilities.

In recent decades, brain imaging methods mostly functional magnetic resonance imaging (fMRI)—have dramatically increased our knowledge about the brain structures involved in diverse cognitive tasks including thinking, although enthusiastic claims that fMRI can "watch the brain while thinking" are quite overstated (see Satel & Lilienfeld, 2013, for a critique). Basically, the standard fMRI method can contrast the metabolic activity pattern in the brain during a task with the activity pattern in another (control) task, and the regions with the greatest activity differences are probably involved in the processes that differ between the tasks. Hence, the experimental logic is quite similar to Donders' (1868) subtraction method for response times, and the better the tasks are chosen, the more meaningful the interpretation of the activation differences. In the last few years, complex statistical methods called *connectivity analysis* have also been developed which give very detailed information about the path and time course of activation that spreads through the brain during specific tasks (see Bastos & Schoffelen, 2016, for a review).

A wealth of knowledge about brain structures involved in various cognitive activities has been accumulated in the meantime, and a deeper treatment of neuroscientific methods is beyond the scope of this chapter. For the interested reader, I highly recommend Ward (2015) and Purves et al. (2013), for introductions into cognitive neuroscience.

# 3.5 Conclusion

The behaviorists believed that investigating thoughts and consciousness would require introspection and verbal reports which are subjective and notoriously unreliable. Hence, they believed the mind to evade

serious scientific investigation. As this chapter has shown, cognitive psychologists have proven this aspect of behaviorism to be blatantly wrong. Numerous innovative techniques that rely on objective data were developed that shed light on the proverbial "black box" of the mind. As recent devlopments like response dynamics and eye tracking show, this development of clever methods is still going on, and

it will without doubt help to reveal more fascinating insights into cognition in the future.

# Acknowledgements

I would like to thank Pascal Kieslich, Sophie Scharf, and Yury Shevchenko for helpful comments on a draft of this chapter.

#### Summary

How can theories about unobservable events like cognitive processes be tested and evaluated empirically? Since the method of introspection (self-observation) was criticized very early on for various reasons, cognitive scientists have developed a large toolbox of other methods that yield more objective data for testing theories about cognition. The idea behind this is that cognitive processes like retrieving a memory or solving a logical puzzle lead to observable consequences in behavior. The easiest methods just measure the outcome of a process, e.g. whether an item is solved or not. Depending on how precise the theory is, this can provide surprisingly detailed information about cognition. For example, items may be chosen in a way that different processes predict different solution patterns across these items which may allow the inferring of a strategy. Another set of methods tries to tackle the underlying processes more closely, for example by dissecting response times or by monitoring information uptake with information boards or eye movement analyses. Also, movements during response generation can reveal conflicting response tendencies. Finally, theories about thinking and cognition can profit very much from computer simulations and of course neuroscientific research that investigates the neural underpinnings of the processes.

#### Review Questions


### Hot Topic: Single or multiple mechanisms in decision making?

My research in the last two decades has been greatly inspired by research on "adaptive decision making" showing that people flexibly adapt their decision behavior to changing environmental demands, such as time pressure, memory retrieval demands, or payoff structures. The predominant view has been that we can choose from a large repertoire of qualitatively different strategies and heuristics that we employ under appropriate circumstances (see Textboxes 3.3. and 3.4). This idea of a strategy "toolbox" was especially promoted by Gigerenzer et al. (1999) and stimulated a lot of research. After developing valid methods for *diagnosing*

Arndt Bröder

these strategies in a valid manner (see Textbox 3.3), my further research investigated under which circumstances these strategies and simple heuristics are applied (see Bröder, 2012, for an overview). However, there are also critics of the toolbox metaphor, claiming that we might rather use a *single* mechanism for deciding, such as the evidence accumulation model described in Section 3.4.2 (Figure 3.5), and widening or narrowing the gap between decision thresholds may just *mimick* the use of different strategies, although people just change a parameter in a universal strategy. Both views are notoriously hard to differentiate empirically. In a series of elegant studies, my doctoral student Anke Söllner showed that indeed the evidence accumulation view is more plausible than the multiple heuristics view to describe information acquisition (Söllner & Bröder, 2016). Recent joint work with colleagues favoring another "unified strategy" approach based on coherence-maximization principles also showed that predicions from this theory appear to explain search behavior better than the multiple strategies view (Jekel, Glöckner & Bröder, 2018). The debate which metaphor is more appropriate will probably continue for a while (see Marewski, Bröder & Glöckner, 2018), but I always try to respect Konrad Lorenz' advice: "It is a good morning exercise for a research scientist to discard a pet hypothesis every day before breakfast. It keeps him young." *<sup>a</sup>*

#### References


*<sup>a</sup>* Lorenz, K. (1966/2002). *On aggression*. London: Routledge. (p. 9)

# References


Bröder Methods

*tion: An International Journal*, *20*(3), 768–776. doi:10.1016/j.concog.2010.12.007


ing. *Psychonomic Bulletin & Review*, *20*(4), 790–797. doi:10.3758/s13423-013-0389-0


#### References Bröder

*nizational Behavior and Human Decision Processes*, *68*(1), 28–43. doi:10.1006/obhd.1996.0087


Eidels (Ed) (Hrsg.), *The Oxford handbook of computational and mathematical psychology*. (pp. 35– 62). New York, NY, US: Oxford University Press. doi:10.1093/oxfordhb/9780199957996.001.0001


*ogy: General*, *122*(2), 166–183. doi:10.1037/0096- 3445.122.2.166


# Glossary


# Chapter 4

# Concepts: Structure and Acquisition

KIMERY R. LEVERING & KENNETH J. KURTZ

Marist College & Binghamton University

A good way to begin thinking about the psychology of concepts and categories is by making some connections to other familiar and foundational elements of human cognition. Perception provides organized sensory impressions about the physical world. Memory contains a record of experience and a storehouse of what we know about the world. Reasoning is the process of going beyond available information to generate inferences or conclusions. How do concepts and categories fit in? One can convincingly argue that they tie these elements of our cognitive system together.

Perhaps the most fundamental and universal cognitive task is matching our perceptions of the environment around us with our knowledge in memory about the kinds of things that exist and the kinds of meaning that characterize scenes and situations. This knowledge is our set of concepts—the tools of thought or mental representations we apply to identify and understand a stimulus. From a memory perspective, it would take a lot of effort and capacity to remember (and treat as distinct) each of the seemingly infinite number of objects, people, places, and ideas in our environment. Instead, our cognitive system has the remarkable ability to organize our experiences in long-term memory, grouping instances together into one common concept despite the many ways they might differ. Every apple you encounter is a little different, but the commonalities shared across the category cognitively outweigh their differences enough to warrant grouping them together into a concept of *apple*.

As a result of classifying something we have never encountered before (e.g., recognizing an item on display in a grocery store as an *apple*), we do not need to figure out everything about it from scratch. We can assume that our category knowledge applies to this instance and a number of important consequences follow. We can access other knowledge that is connected to the category (e.g. trees, serpents, gravity, teachers, pies, etc.), we can communicate to others about it (e.g., "Hey, pass me that apple!"), we can reason about and predict characteristics that may not otherwise have been obvious (e.g., it tastes sweet and offers nutrients), and we can use the categorization toward further explanation (e.g., someone who orders an apple instead of fries is trying to be healthy). As Murphy (2002) wrote, concepts are "the glue that holds our mental world together" because of their role in virtually every cognitive experience we have.

Philosophers and other theorists have long reasoned about how people learn, represent, and use concepts, but in the latter half of the 20th century, psychologists began to collect empirical data from carefully controlled laboratory experiments to test theories grounded in the information-processing framework. As in other areas of the field, research has blossomed through the application of interdisciplinary approaches such as computational modeling. In this chapter, we will review theories, models, and behavioral data that have helped us to understand how concepts are acquired and structured.

It is understood that we do not come into the world as infants knowing what concepts like *fork* or *athlete* are. The rich knowledge we achieve about natural concepts comes about at least in part from experiencing examples and organizing them into groups (either on our own or based on what we are told). But what is the organizing basis that causes individuals or cultures to divide up the world as we do? What gives concepts their naturalness, their coherence, and their usefulness?

Most work in the field is consistent with the broad assumption that concepts emerge because the members of a category are like each other and different from other kinds of things. On this view, categories arise because there are regularities and a natural order in the world that can be discovered. It does not take any special work to invent categories—for example, apples are intrinsically like one another and unlike non-apples. The physical properties of objects as experienced through our senses are the grounding basis for categories. This idea of featural similarity has been defined in a number of ways, but it often refers to how many properties or features are shared (e.g., Tversky, 1977). For example, you would probably say that a dog is more similar to a wolf than a peacock in part because a dog and a wolf both typically have four legs, paws, fur, etc. while a dog and a peacock share far fewer characteristics. Another foundational approach to similarity is based on the geometric distance between items represented as points in a multidimensional psychological space (Shepard, 1957, 1987). To understand this, consider a cube where each interior point represents a value along each of three spatial dimensions (length, width, and depth). Shepard proposed that examples are represented as points in a multidimensional space corresponding to their values on the set of psychological dimensions along which examples vary (for example, apples may be defined in terms of roundness, redness, crunchiness, size, etc.).

When we experience a set of examples that are importantly alike (or when we are directly told that they belong to the same category), this experience invites a process of building up a general-level un-

derstanding that holds across these examples and supports generalization to new cases. This basis for category membership can be a set of features or dimension values that an item must be similar to—or it can be a rule that specifies exactly what features or dimension values are required for membership. There have been various attempts to describe how concepts arise from experience, and evaluating the relative merits of these theories has made up a considerable amount of the work in human category learning.

# 4.1.1 Concepts as Abstractions from the Data

Many theories of categorization assume that as you encounter examples from a category, you engage in a process of abstraction. This means that some detail about an example or collection of examples is lost and only the most important parts make up your concept. To understand abstraction, imagine being asked to draw a picture of your bedroom. Rather than a precise replica of the room, your picture would likely be simpler and contain fewer details. The exact number of dresser drawers, the color of your bedspread, and maybe even the presence of certain items might not be included in your drawing because you have either forgotten those details or don't consider them to be important. This is a gist-like representation of a single instance. To form concepts, the gist is formulated across many examples (other people's bedrooms) or at increased levels of abstraction (different types of rooms, interiors, physical environments, etc.). There are a number of ways that categories can be formed as abstractions, depending on the specific basis for what information to keep or discard.

#### 4.1.1.1 Abstracting Defining Features—Classical View

The first possibility considered was that concepts are formed by abstracting a fundamentally important characteristic or set of characteristics that all examples of a category have in common. For example, you may learn over time that to be a *grandmother*, someone must (1) be female and (2) have

grandchildren. As long as someone meets those necessary (they must have these qualities) and sufficient (having just these qualities is enough) conditions for membership, they are a *grandmother*. Because all that is needed is satisfying some criteria, examples are either members of the category or not, and no example is any better or worse than any other. Acquiring a concept then is a process of gradually learning the essential properties that something needs to have in order to be considered a member.

This account of essential or defining properties has been around so long and was so popular in philosophy that it is often called the classical view (Smith & Medin, 1981). It wasn't until the mid-20th century that philosophers and psychologists began to take issue with some of its assumptions. First, it was argued that there are no perfect definitions for categories. Wittgenstein (1953) famously argued that the concept "game" cannot be defined by any set of necessary and sufficient properties. He defended against a number of possible attempts to do so (e.g., must a game involve competition? must a game involve winning/losing?) You may expect these kinds of definitions to be easier for taxonomic categories like animal species or chemical compounds, but it has been exceedingly difficult to come up with hard and fast definitions even for these

types of categories. If a necessary characteristic of a dog is that it has four legs, does an animal stop being a dog if one of its legs is amputated? Objects not fitting a definition can also sometimes be considered members of a category. For example, Lupyan (2013) found that people were willing to call someone a "grandmother" even if they had no grandchildren. Second, there are many examples that do not seem to fit cleanly into one category or another. Medin (1989) gives an example of rugs, which could be considered members of the category *furniture*, but do not seem to quite belong. Third, we see evidence of graded structure, meaning that some examples of a category are seen as better examples of that category than others. If you were asked to rate a list of fruit in terms of how typical they were of the category *fruit*, you would probably rate a banana as more typical than an avocado. This has been found consistently, even for categories thought to be the most well-defined. For example, Armstrong, Gleitman, and Gleitman (1983) found that certain examples of the category *even numbers* (e.g., 4) were considered to be better examples than others (e.g., 34). Such typicality effects are not easily explained by a theory that assumes examples to be simply in a category or not.

Figure 4.1: Difference between prototype and exemplar approach to the concept of *dog* arising from experiencing nine different dogs. Exemplar theory assumes the concept to be the collection of memories of each instance while prototype theory assumes the concept to be an abstracted example representing an average on relevant features.

## 4.1.1.2 Abstracting a Set of Common Features—Prototype Approach

In response to criticism of the classical view, a theory arose in philosophy (Wittgenstein, 1953) and later in psychology (Posner & Keel, 1968; Hampton, 1993; Smith & Minda, 2001; Rosch & Mervis, 1975) that while we do abstract the most common or central properties among category members, none of these properties are necessary or sufficient. In this set of views, eventually called the prototype approach, an item can be missing some features and still be considered a member of the category. Proponents of this view often think of concepts as boiling down to a single example, a *prototype*, that has the most common characteristics (e.g., has four legs) or the most common values along relevant dimensions (e.g., is 2.5 feet long). In Figure 4.1, the prototype is the average of the nine dogs experienced, even though that average is not exactly like any one of the dogs previously seen. In this view, we develop prototypes for every concept and then a new instance is classified based on which category's prototype it is more similar to. This view is often thought to better describe natural categories as members often share most but not all features, a property called *family resemblance*. This view is also considered more successful at explaining experimental findings such as unclear category membership (rugs just don't have many of the common features of furniture and are far from the category prototype) and typicality effects (items rated as less typical tend to possess fewer common features).

#### 4.1.1.3 Abstracting a Boundary

Rather than developing a conceptual representation that is the center or average of a set of category members, other researchers have proposed that we instead update information about the boundaries of a category (Ashby, 1992; Ashby & Maddox, 1993). If the goal of concepts is to differentiate between types of things, perhaps the most important consideration is the partition line—where one category ends and another begins. For example, rather than seeing how similar a new banana is to your prototypes for the concepts *ripe banana* and *unripe banana*, we may simply use information about the point at which a banana goes from being classified as unripe to ripe along one or more dimensions. Knowledge of these partitions can identify examples of a concept without having to know anything specific about other examples or common/average features.

# 4.1.2 Concepts as just the Data—Exemplar Approach

More recently, a set of theories has centered on the idea that we do not form abstractions at all but rather store specific information about examples themselves (see Figure 4.1). In other words, your concept of *apple* is made up of some version of a memory of every apple you have encountered (or at least the first or most prominent ones). New apples are recognized because they are highly similar to examples that have been thought of as apples before. In fact, the most successful explanations rely on the assumption that only the examples *most* similar to the new apple have influence on classification.

This exemplar approach (Medin & Schaffer, 1978; Nosofsky, 1984, 1986; Kruschke, 1992) can explain prototype effects related to typicality and fuzzy boundaries because examples that are dissimilar to prototypes are also frequently dissimilar to other examples in the category. *Rug* and *ostrich* would be considered poor examples of their respective categories because they are not highly similar to any other piece of furniture or bird. Formal versions of exemplar theory have been highly successful at predicting human performance, particularly in cases where there are not many examples to learn. These draw upon two main design principles. The first is that category representations are labeled exemplars that serve as reference points for similarity comparisons. When a new example is experienced, the model figures out how similar it is to the known examples it has stored, and bases classification on the category associated with the closest match. The second has to do with how similarity is computed. In the process of looking for particularly close matches, some dimensions may be treated as more important than others, a property known as dimensional selective attention. If we learn that size is useful when distinguishing between types of dogs, this feature

should be given more influence than something less useful like number of legs. Selective attention is typically thought to happen during encoding (meaning the number of legs a dog has does not even register) but could also be applied at the point of making a decision (the number of legs registers but does not contribute to the decision of what type of dog it is). There is plenty of experimental evidence suggesting that we use selective attention when we are learning categories, although this tendency does not seem to be as central to categorization in infants and young children.

#### 4.1.3 Piecing Together Concepts

Much research in the last 50 years has been directed at evaluating whether concepts should be thought of as rules, prototypes, or a collection of exemplars, and evidence has been found in support of each account to differing degrees. Given that learning appears to vary in important ways across people, situations, and content, the category learning system could involve multiple processes or systems that invoke different underlying mechanisms. In line with this, several hybrid models have been developed, each asserting that information from separate systems is either combined, competes, or that a second system takes over when a primary system fails. One class of hybrid models assumes that concepts are acquired through a combination of learning rules for membership and storing individual examples (Erickson & Kruschke, 1998; Nosofsky, Palmeri, & McKinley, 1994). An approach that emphasizes separate neurobiological systems makes a strong distinction between an explicit verbal rule induction system and an implicit, procedural system (Ashby & Maddox, 2005). Similarity-based models have been developed that allow for both abstraction and exemplar-like effects by letting the model determine on the fly whether to represent the category with many clusters (a unique cluster for each item would be the exemplar approach), with one cluster (prototype view), or with an intermediate number of clusters (having a set of sub-prototypes to capture different aspects of the category; Love, Medin, & Gureckis, 2004). Another highly flexible approach is based on learning what configurations of feature

values are consistent with each category—this involves no explicit use of rules or reference to specific exemplars or prototypes (Kurtz, 2007, 2015; see Hot Topic).

# 4.1.4 Explaining the Data

The approaches we have considered up to this point take the data about categories (i.e., the members of a category) as the direct basis for psychological representations of categories. This is most clearly evident in the exemplar view: the representation of a category consists strictly of the stored examples known to belong to the category. Abstractive accounts are based on finding a summary representation that captures the character of the category members without having to store them all. A rule is a representation that only requires storing the features that are necessary and sufficient for determining category membership. Instead of storing every example, the learner stores the information that must be true of each category member. A prototype is a statistical rather than logical form of summarization—instead of trying to summarize what is true of each example, the idea is to keep track of the central tendency among the examples. In this way, the nature of the category is captured by the set of feature values that are most representative of its members (i.e., storing a single canonical example –that could be real or made-up—instead of storing them all).

Are there alternatives to category representations that use the examples or summaries of the examples as building blocks? Why might such alternatives be important? One important consideration is that the present approach assumes that the available data (the representations of each example) contains everything we expect our categories to contain. If that is so, where do these item representations that are as semantically rich as our concepts come from? For example, if our concept of apple is merely a representation of physical features, how can that explain other information about apples like their role in appreciating teachers, avoiding doctors, discovering gravity, worms, cider, pesticides, bobbing, pies, etc. This issue becomes more extreme when considering categories that are even slightly more abstract (e.g., bag) where what makes examples similar is a con-

struction rather than something directly derived from physical form. A promising proposal that has received only limited attention distinguishes between a *core* and an *identification* procedure for concepts (Miller & Johnson-Laird, 1976; Smith & Medin, 1981). The identification component is perceptually driven, while the core of the concept includes richly constructed semantic elements that arise from world knowledge and the interaction between humans and their environment.

Also in line with criticisms of similarity- or datadriven approaches is a theory-driven approach which considers categorization to be a process of explanation rather than similarity-based matching (Murphy & Medin, 1985). In this view, category representations are grounded in knowledge about what makes something a member that is not expressed in the same terms as item representation. In other words, a stimulus is not a chair because it has features that closely resemble the features of known chairs (or a summary of the features of known chairs); instead, the stimulus is a chair because the data (our sensory experience) is best explained in terms of the explanatory principles underlying chairs. What might such principles be? Researchers have looked to function and origin for such principles: Does it do what a chair should do? Was it built to be a chair? Is it used as a chair?

The classic example from Murphy and Medin (1985) asks how we categorize a fully clothed man in a pool. The suggestion is that we explain the available data in terms of the category of drunkenness by recognizing how explanatory principles like reduced coordination/judgment accord with what we see—it is not that we identify a close feature-byfeature resemblance between the man in the water and our prior experience of drunk people. The theory view of categorization provides an important critique of standard accounts: matching between stimuli and category representations requires solving the problem of identifying the "respects" for similarity—what are the features to compare upon and with what weights or importances?

In practice, researchers have had little success in translating this viewpoint into a mechanistic account of the processes and representations underlying categorization ability. Even so, much progress in the

field can be seen as offshoots off the influence of the theory view. For example, an important idea rising in the field takes the perspective that categories are best represented as models of the statistical regularities that hold among category members; and the models are applied to categorize examples through a process of fitting the data rather than matching it (see Hot Topic). This resonates with a view that categories may be best understood in terms of schema theory as organized generic knowledge structures that can be activated and instantiated by filling slots with specific values (see Komatsu, 1992; Rumelhart, 1980). Another approach emphasizes the role of causal relationships in category learning and representation, for example the presence of wings on a bird and the bird's ability to fly (cf., Ahn & Kim, 2000; Rehder, 2003).

Murphy and colleagues have extended the impact of the theory view in a number of ways including a critique of the way category learning is typically studied in the laboratory that reinforces limited psychological accounts by excluding the critical role of prior knowledge about features, concepts, and general semantic memory (e.g., Murphy & Allopenna, 1994; Murphy, 2003; Wisniewski & Medin, 1994). Researchers have also been influenced by the theory view in expanding the problem of categorization beyond the ability to classify traditional taxonomic categories. There is a diversity of kinds of categories and a diversity of ways in which categories are learned and used (Markman & Ross, 2003; Medin, Lynch, & Solomon, 2000; Kurtz, 2015).

# 4.2 Modes of Category Learning

While the study of human category learning is ultimately about real-life concepts like *athletes* or *forks*, it is often difficult to answer questions about how *natural categories* like these are acquired because they have already been learned in unique and personal ways that cannot be easily controlled for. In order to get around this, cognitive psychologists create and teach artificial categories that can be more precisely controlled. These artificial categories are made up of members that participants have never seen before but that possess simpler versions of the

kinds of features that exist in the real world. Examples are grouped into categories by researchers, often according to the same kinds of principles that we think real categories are grouped by. Participants are then taught which category each example belongs to, imitating the process by which we learn about categories in the real world. What people learn about the categories can be assessed by having them decide what category some new item is in or by asking them questions about trained examples (*How typical is this example of its category?*), features (*What category is a winged creature most likely to be in?*), or relationships between features (*How likely are winged creatures to have webbed feet?*). Specific aspects of the task (the stimuli, which examples are in which category, how many categories, etc.) can be manipulated to see in what way those changes affect how easily categories are learned, what kind of information is remembered, or how that knowledge is applied.

# 4.2.1 Learning Concepts Through Classification

Most commonly, concept learning is studied through a supervised category learning (see Figure 4.2), in which images are presented one at a time and learners decide which of usually two categories each belongs to. They are told whether they are right or wrong (this feedback is what makes the learning considered supervised) and over time they learn to cor-

rectly assign examples to the appropriate category, often with high accuracy. More than just memorizing what category each example is in, learners can pick up on relevant commonalities and differences between the categories, just like how we learn about what tends to be true of dogs and what distinguishes dogs from coyotes.

It is not hard to come up with real life instances that align with this kind of learning. For example, imagine you see an animal running across your lawn and think that it is a coyote before your friend informs you that it is in fact your neighbor's dog, Fluffy. Although we can think of cases fitting this kind of guess-and-correct classification, it is not likely the only or even primary way we learn. Concepts are most likely acquired through a combination of many modes of learning, in service of particular goals. What makes up your concept of dog likely comes from times in which you knew something was a dog before you saw it (e.g., your friend invites you over to meet her new dog), made inferences about a dog that ended up being true or not (e.g., you learn whether or not a dog will play catch), or learned about dogs incidentally while focusing on a specific task (e.g., picking out a pet from a pet store). Sometimes you may not even get feedback about whether your idea of category membership or predicted features are correct (e.g., you never find out whether the animal that ran across the lawn was a coyote or a dog).

Figure 4.2: Example of one trial of a supervised classification task. The participant views an example and decides which of two categories it is in before receiving feedback.

# 4.2.2 Learning Concepts Through Inference, Use, and Observation

Research has provided evidence that differences in the way a concept is learned are important. Oftentimes, when a learning task is changed, different kinds of information are acquired. For example, when participants learn by predicting features of labeled examples, they often learn more about the most common features and the relationships between features (Markman & Ross, 2003; Yamauchi & Markman, 1998). The fact that certain features "go together", or are more typical or central, are aspects of the internal structure of categories. Knowledge of internal structure gives us a sense of what is generally true of a category, sometimes above and beyond what is necessary to figure out what something is. For example, the fact that silverware is typically made of metal may be useful to learn even if it does not help you determine if something is a spoon or a fork. In addition to *inference learning*, internal structure is also better learned through *indirect learning tasks* where organization into categories helps to accomplish some goal like predicting how much food animals would eat but categories are not explicitly learned (Minda & Ross, 2004). It is also better learn in *observational tasks* where category labels are provided before the example is shown, and guessing is not necessary (Levering & Kurtz, 2015). In essence, task demands during learning influence what is attended to and what becomes more central to the representation of a category. When a task focuses the learner on classification, the learner focuses on the information that is necessary for classification but when that focus is removed, more robust knowledge of internal structure can be acquired. Because categories in the real world are used for a multitude of different tasks, developing robust categories through multiple modes of learning is essential.

### 4.2.3 Organizing our Own Concepts

In many cases, we cannot rely on category membership being explicitly defined for us but rather we must organize our observations into categories using our own heuristics. For example, your concept

of music genres (e.g., *classical music* or *hip hop*) has probably not come from listening to carefully labeled songs and learning the features associated with each genre. While some experiences may have been labeled for you (e.g., you hear a song while listening to a country radio station), you have largely constructed your own organization based on unlabeled examples. Research into purely *unsupervised* classification is often difficult because there are so many ways that a number of items can be organized. One common finding emerging from this research is that when asked to sort items into categories, people tend to focus on forming rules along single dimensions (e.g., Medin, Wattenmaker, & Hampson, 1987). For example, you may decide that any song being sung with a southern twang is country music and not need consider any other dimension.

Rather than completely unsupervised, our learning is often semi-supervised, meaning that we experience a combination of labeled and unlabeled examples. Studies on the role of unlabeled examples (relative to completely supervised learning) has been mixed, sometimes showing that they are helpful, sometimes hurtful, and sometimes having no effect. Recent research has suggested that labeled cases are important when categories are highly similar and therefore category membership is ambiguous. For example, it would be useful to have some labeled cases when distinguishing subtle differences in types of electronic music, but not when learning the broad difference between classical and punk music (Vong, Navarro, & Perfors, 2016).

Even when learning about a concept is supervised, it is sometimes possible for us to decide which examples we want to learn about and when. For example, on a trip to the zoo, a child may ask a parent to label certain unknown examples ("antelope?") but not others. This self-directed learning (also known as active or selective learning) is thought to be more effective than passive (receptive) learning, particularly when category distinctions are based on simple rules (Bruner, 1961, Markant & Gureckis, 2014). Differences in how people learn in these modes can be simulated in the lab by having one group of participants construct or select specific examples to learn about while another group is either given a random presentation order or a presentation order

that matches a participant in the first group (this is called a yoked design). In these kind of studies, the participant who made the selection often learns the categories better despite being exposed to the exact same examples as their yoked counterparts (Schwartz, 1966). Possible reasons for this could be that self-directed learning is more engaging, results in deeper processing and better memory for examples, or allows for more focused attention oriented toward testing specific hypotheses about category membership (see Gureckis & Markant, 2012, for more information).

#### 4.3 Kinds of Categories and Their Uses

An important early contribution in the empirical investigation of category structure was the finding that categories are organized at different hierarchical levels that serve different purposes—and specifically that an intermediate level, known as the basic level of categorization, appears to play a foremost role in guiding the way we access and use categories (Rosch & Mervis, 1975). Very specific categories (waterbuck antelope) capture tightly knit knowledge reflecting a large overlap in the features that each member has. This means that a great deal can be inferred with high confidence about a member of such a category. Very broad categories (mammal) are based on only a few core common properties that carry a great deal of weight in organizing knowledge, but do not provide much specific information about their members. The basic level (antelope) provides a compromise of reasonably high resemblance between members of a single category and low resemblance between members in different categories. Therefore, the basic level of categorization may be our most fluid and task-general way of making sense of everyday experience. Interestingly, the level of categorization that is privileged may not always be the basic level—instead it varies depending on factors including age, domain expertise, cultural norms, and the goals or tasks for which the category is being used (see Medin & Atran, 2004; Tanaka & Taylor, 1991).

As discussed above, the theory view suggests that concepts may not be sufficiently grounded by physical similarities (see Goldstone, 1994). This may or may not apply to ordinary entity concepts like dog and chair, but it has become clear that there are important kinds of categories that are certainly not subject to traditional similarity (high levels of match between features) as an organizing principle.

Barsalou (1983, 1985) demonstrated the existence and psychological role of *ad-hoc* categories that are generated in the moment (i.e., things to take out of a house in case of fire) as well as more stable categories that are *goal-derived* (i.e., things to eat on a diet). Critically, the members of these categories lack any traditional featural similarity to one another but do cohere systematically around functional *ideals* or goal-relevant properties (i.e., zero-calorie). More broadly, the term *relational* has been proposed (Gentner & Kurtz, 2005; Markman & Stillwell, 2001) to describe categories based on how objects relate to one another within scenes or situations. For example, an 'obstacle' is a category that can take nearly any concrete or abstract form, but that coheres around fulfillment of a relationship wherein one entity blocks the progress of another. Relational categories are grounded in structure-mapping theory (Gentner, 1983), which specifies how the alignment of structured representations (entities organized by filling roles in relations) drives psychological similarity. On this view, much of the meaning that people represent about the world is more complex than simple objects and requires specification of what elements relate to other elements in what. A great deal of empirical evidence shows that comparison processes (analogy, similarity, metaphor) play a major role in human cognition, and operate based on a search for identical sets of connected relationships between cases (see Gentner, 1983). Researchers are pursuing the study of relational categories with an important emphasis on real-world learning where challenges include mastering foundational concepts in formal instructional settings and promoting successful use of acquired knowledge when the context or surface-level form is not the same (Goldwater & Schalk, 2016; Kurtz & Honke, 2017; Loewenstein, 2010).

# 4.4 Future Directions in Concepts

While scientific progress toward an understanding of how people learn, represent, and use categories has been considerable, there remain significant frontiers and challenges. One is that researchers have found a number of explanatory principles that do a good job of accounting for at least some part of the overall problem, but it is not clear whether the categorization system is deeply multi-faceted (i.e., variable across domains, settings, learners, etc.) or whether the range of performance characteristics reflects different manifestations of a single universal, highly flexible mechanism. Another major challenge is unifying our account of real-world, everyday categorization with advances made using highly artificial tasks in the laboratory. Lastly, there is an important need for synthesis and integration of data and theory from perspectives outside of the core approach that have produced largely siloed progress. For example, developmental psychologists have made important

progress in understanding the transitions from infant to child to adult forms of categorization (Carey, 2009; Keil, 1989; Sloutsky, 2010), but there is limited cross-talk despite the obvious value to be gained. Similarly, a subset of researchers has focused on neurobiologically-oriented accounts of categories and concepts with pockets of impact arising between the approaches (e.g., Ashby & Maddox, 2005; Barsalou et al., 2003; Tyler & Moss, 2001). In addition, a set of mathematically-formulated accounts of concept formation seem to exist as a largely independent enterprise (Feldman, 2000; Pape, Kurtz, & Sayama, 2015; Vigo, 2013). We end by noting an emerging counter-example: the burgeoning field of machine learning/data science in which classification tasks are one of the core problems addressed. In a promising development, researchers are increasingly finding value in drawing upon and contributing to research on learning and representation of categories in both humans and machines.

#### Summary


#### Review Questions


### Hot Topic: Categorization as finding the best account of the data

Kimery Levering

Rather than using similarity to reference points, the theory view suggests that items are categorized based on how well the item's features are explained by a category. This notion of "well-explained" can be realized without departing the realm of data. For example, one could compute the likelihood of an item having the features that it does if it were a member of a particular category. This conditional probability is based on knowing how many category members have each feature (e.g., having spots) versus not. Following Bayes' Theorem, instead of using the features to directly predict the category, one uses the likelihood of dogs having spots (and the other observed features of the target)

to predict how well the category fits the example. If the example has features that occur frequently among dogs and the category itself is sufficiently common then that is strong evidence of membership. Anderson (1991) proposed a *rational* account in which the goal of categorizing is to make the most accurate possible inferences given the data. In this way, categorization is explained as forming clusters (neighborhoods) of the items in a domain and then predicting the category based on how likely each item feature is relative to each cluster combined with the likelihood of the category within each cluster. Criticisms of this approach include evidence that people make predictions based

on one assigned category rather than by combining likelihoods arising from each possible category, evidence that people do not treat category labels as just like any other feature to be predicted, and the issue that the Bayesian foundations underlying this account implausibly assume feature independence.

Kenneth Kurtz

Fortunately, there is another way to determine how "well-explained" an item's features are relative to a category. Kurtz (2007) proposed that categories can be understood in terms of: (1) a transformation function instantiated as a set of synapse-like connection weights between a layer of neuron-like nodes that encode the input feature values and a "hidden" layer that recodes the information in an internal learned feature space; and (2) reconstruction functions that predict what item features are most likely with respect to each category. The paired functions represent category knowledge in the form of expectations about what configurations of feature values are consistent with membership. Error-driven learning adjusts the function pairs to work harmoniously for items that belong in each category. When an item is consistent with these expectations, it passes through the functions relatively unchanged, but

when input feature(s) are inconsistent, the functions yield reconstructive distortion—the expected features do not match the observed ones. The amount of such distortion indexes the likelihood of membership. When a cat is evaluated as a dog, the result is a shift toward category expectations (i.e., bigger size, barking call, greater sociality) and this degree of distortion indicates poor category fit. A connectionist model called DIVA (see Figure 4.3) based on these principles provides a better account of human categorization on some critical tests than reference point models (e.g., Conaway & Kurtz 2017).

Figure 4.3: The structure of the connectionist model DIVA (Kurtz, 2007). In this example, a stimulus (three input features) is best reconstructed through the *dog* channel and so the model would classify it as a dog.

#### References


#### References Levering & Kurtz

# References


Markman, & P. W. Wolff (Eds.), *Categorization inside and outside the lab* (pp. 151–175), Washington, DC: APA. doi:10.1037/11156-009


*ory & Cognition*, *43*, 266–282. doi:10.3758/s13421- 014-0458-2



of the beholder? *Cognitive Psychology*, *23*, 457–482. doi:10.1016/0010-0285(91)90016-H


# Glossary


# Chapter 5

# Knowledge Representation and Acquisition

ARTHUR C. GRAESSER, ANNE M. LIPPERT, & KEITH T. SHUBECK

University of Memphis

This chapter discusses how knowledge is represented in our minds when we learn about new topics in school and life. How do we encode and think about subject matters in fields as diverse as psychology, literature, art, history, biology, physics, mathematics, and computer technology? The knowledge representations and reasoning in these fields often differ (Goldman et al., 2016). In psychology and physics, we think like a scientist. We think about hypotheses and how to test them by collecting data in experiments. In mathematics, we puzzle over formulas and proofs. In literature, we construct imaginary worlds in our mind that may or may not correspond to anything in the real world. In computer technology, we think about procedures for running programs that perform some practical task. The representations and ground rules for thinking are quite different in these different disciplines.

There are multiple ways to represent experiences and topics of interest. Popular music is a great example of this. Consider how people represent music when they listen to songs such as *Hey Jude* by the Beatles, *Crazy in Love* by Beyoncé, or *Yankee Doodle*. Some have representations that focus on the melody, others the lyrics, others the emotions, others visual images, and others the rhythm and meter that inspire dance or other forms of physical motion. Most of us have mental representations with some combination of these dimensions. There is no right or wrong representation, but memory for the songs is influenced by the nature of the representations

that people construct (Rubin, 1995). Psychologists in the learning sciences investigate the nature of the representations that we construct when we learn new topics and use the knowledge when performing tasks.

Mental representations of what we perceive are not perfect copies of the world out there. The mental representations we construct about the world are simplifications that often have errors and distortions. As an interesting exercise, draw from memory a floorplan of your home, with the various doors, windows, and pieces of furniture. Then compare the sketch with your actual home and note the differences. Or if you prefer, sketch your town with the streets and landmarks. Although you have experienced your home and town for hundreds of thousands of days, there are still distortions. Psychologists in the cognitive sciences investigate theories about the properties of these mental representations and conduct experiments to test the theories.

This chapter identifies some of the theories of representation that cognitive and learning scientists have developed. Their goal is to explain how children and adults represent knowledge during learning. The focus of this chapter is on learning when adults acquire subject matters in schools, the workforce, and their personal lives. In contrast, Chapter 4 ("Concepts: Structure and Acquisition") and Chapter 17 ("Development of Human Thought") take on the development of representations in infants and children. Our emphasis is also on deeper levels of comprehension and learning (Millis, Long, Magliano, & Wiemer, 2019). A recent report by the National Academy of Sciences, Engineering and Medicine on *How People Learn* (volume 2, 2018) contrasts six basic types of learning: habit formation and conditioning, observational learning, implicit pattern learning, perceptual and motor learning, learning of facts, and learning by making inferences from mental models. This chapter emphasizes the learning of facts and making inferences from mental models, although the other types of learning are sometimes very relevant.

Instructional media and technology will play an important role in this chapter because they dominate the world we live in today. Media and technology shape how we think and represent information. For example, a few decades ago it would have taken days to find an answer to a question as people walked to libraries, to card catalogues, to stacks of books, and searched pages and paragraphs for an answer. The same question can now be answered in seconds on the computer. We expect swift answers to questions and get irritated by delays. A decade ago students submitted essays for grading and waited for days or weeks for a grade. Now essays can be graded immediately with validity comparable to experts (Foltz, 2016). We now live in a world of intelligent tutoring systems that tailor learning to the individual student (Graesser, Hu, & Sottilare, 2018) and computer environments where groups of people can learn and solve problems together (Fiore & Wiltshire, 2016). We now live in a world where facts need to be checked for misinformation and contradictions (Rapp & Braasch, 2014) and technology has the only major capacity to do so. We live in a world of media, games, and adutainment. These seductions appeal to our motivational and emotional seductions and run the risk of competing with the learning of important subject matter. All of these advances in media and technology influence how we represent and acquire knowledge.

#### 5.1 Knowledge Components

This first approach to representing subject matter knowledge consists of a list of knowledge com-

ponents. A knowledge component is much like a sentence that expresses a particular idea that is important to know about a topic. Example knowledge components in psychology can be captured in such expressions as "absence makes the heart grow fonder" (as the opposite to "out of sight, out of mind"), "team members in groups may not respond because they expect other members to respond", or "correlation does not imply causation." An example in physics is "force equals mass times acceleration" whereas an example in mathematics is "the circumference of a circle is pi times the diameter." Some knowledge components are if-then rules with contingencies: "If a person has XX chromosomes, they are female; if a person has XY chromosomes, they are male." The subject matter on a topic may consist of a long list of dozens to hundreds of knowledge components. As students learn a subject matter, students and teachers do not know how well the performance on these knowledge components is progressing. However, computers can track this progress for individual students in intelligent tutoring systems (Graesser, 2016; Koedinger, Corbett, & Perfetti, 2012) and for individuals and groups in team learning (von Davier, Zhu, & Kyllonen, 2017). When the computer determines that enough of the knowledge components have been learned by the student, the system then decides that the student has mastered the topic.

How does the student, instructor, or computer know whether a knowledge component (KC) has been mastered? The answer is debatable. Consider once again the knowledge component "team members in groups may not respond because they expect other members to respond." How would one know whether this KC has been mastered by a learner? There are many possible operational definitions. Can the learner recite the KC in words that have the same meaning as the KC? Does the learner send important requests to individuals rather than groups in social communication systems (knowing that there may be diffusion of responsibility in groups)? Mastery of some KC's may be reflected in a number of cognitive measures, such as response times to requests, eye movements, and neuroscience indicators (see Chapter 3, "Methods for Studying Human

Thought"). Individual learners may differ in how they behaviorally show mastery of a particular KC. They may exhibit mastery in words, drawing figures, gestures, problem solving, or other actions.

Mastery of knowledge components improves over time if there is knowledge acquisition. Computers can track this. Suppose a computer tracks whether or not a student on a KC has a successful response (1) or an unsuccessful response (0) over 8 episodes of being assessed. The following sequence would reflect successful learning on assessment episode number 4: 00011111. The sequence 01010101 shows no learning because the number of 1's is the same for the first four episodes and the second four. Probabilistic learning is reflected in 00101011 because there is only one 1 among the first four episodes but three 1's in the last four episodes. Mastery of a topic is achieved when many of the KCs are mastered in performance assessments.

# 5.2 The Representation of Knowledge Components

The mastery of a knowledge component depends how it is represented and how picky one is as to whether it is mastered. A precise standard for a verbal representation would be an exact match between the expected knowledge component and the student's language. However, it is important to match on meaning rather than precise language (Kintsch, 1998). There are many ways to articulate "team members in groups may not respond because they expect other members to respond" in particular contexts, such as "there is diffusion of responsibility in the group", "tell John personally because he expects others on the team to handle the task", or "the likelihood of a team member completing an assigned task is lower than when an individual is assigned the task." How can one determine whether these answers match the KC when they are worded so differently? Computers have made major advances in evaluating the accuracy of semantic matches in a field called computational linguistics (Jurafsky & Martin, 2008), but they are far from perfect. Expert human judges have moderate agreement on whether

two sentences have the same or different meanings, but they also do not always agree.

Multiple levels of language and discourse need to be considered when deciding whether two verbal expressions have the same meaning (Pickering & Garrod, 2004; McNamara, Graesser, McCarthy, & Cai, 2014). We need to consider whether the words have the same or similar meaning. For example, the phrase "team members in groups" is very similar in meaning to "people in groups" in the example KC but not to "sports in groups." Syntax and word order matter when interpreting meaning. The meaning of the phrase "team members in groups" is quite different in meaning than "to members group in teams" and the nonsensical expression "groups team in members." The discourse context also needs to be considered when deciding whether two sentences have the same meaning. The expression "absence makes the heart grow fonder" makes sense in a psychology class when debating whether a romance will survive after two lovers part for a few months. It does not make sense when a student tries to explain to an instructor why an exam was missed.

Mastery of a knowledge component is manifested in its meaning rather than the precise surface structure (i.e., wording and syntax). People tend to remember in long-term memory the meaning of ideas rather than the surface structure (Craik & Lockhart, 1972). Surface structure is normally short-lived, a minute or less, whereas the semantic meaning lasts a long time. Therefore, verbal memory assessments of how well a student has mastered a subject matter need to consider the meaning of the KCs rather than the exact wording. An essay test that taps meaning is superior to a test on reciting texts verbatim.

Mastery of a knowledge component is often manifested nonverbally. Actions, facial expressions, eye movements, pointing gestures, and other behaviors can signal mastery. Consider a KC that "some chemical sprays from groundkeepers cause people to sneeze." When someone starts sneezing, this KC is likely to have been mastered if the person gets up and looks out the window, glares in contempt at the groundkeeper, points to the groundkeeper, closes the window, and/or puts on an allergy mask. There is no need to articulate the KC in words.

Figure 5.1: Four different types of knowledge structures: Taxonomic, spatial, causal, and goal-action procedures.

# 5.3 Knowledge Structures

Our description of the knowledge component representation does not take into consideration the structural relations between ideas. This section emphasizes these relational connections. Four types of structures are being discussed here to illustrate the importance of relations. These are shown in Figure 5.1: Taxonomic, spatial, goal-action procedures, and causal structures. There are many other types of knowledge structures, such as organizational charts of positions in a corporation and the lineage in family trees. All of these knowledge structures emphasize how knowledge is interconnected and that ideas close to each other in the structure are more conceptually related than ideas far away. When an idea is activated during learning, it tends to activate its nearby neighbors in the structure more than neighbors far away (Collins & Loftus, 1975).

There is a terminology that researchers use to talk about these knowledge structures. Nodes are basic

ideas that can be expressed in a word, phrase or sentence. As explained above, however, it is the meaning rather than the surface structure that captures the essence of a node. Nodes are sometimes assigned to epistemic categories, such as concept, state, event, process, goal, or action. An *arc* is a connection between two nodes. An arc is directed (forward, backward, or bidirectional) and often assigned to categories (such as is-a, has-as-parts, property, contains, cause, reason). A graph consists of a set of nodes connected by arcs. Below we describe some different kinds of graphs that are depicted in Figure 5.1.

# 5.3.1 Taxonomic Structures

Taxonomic structures represent the concepts that were discussed in Chapter 4, "Concepts: Structure and Acquisition". The concepts are organized in a hierarchical structure that is connected by *is-a* arcs. A robin is-a bird, a turkey is-a bird, a bird is-a animal, an animal is-a living thing. These is-a arcs

that are directly represented in the graph, but others can be inferred by the principle of transitivity: a robin is an animal, a turkey is an animal, a robin is a living thing, a turkey is a living thing, and a bird is a living thing. Each of these concept nodes have distinctive *properties*, such as a robin has a red breast, a turkey is eaten by humans, a bird can fly, an animal breathes, and living things can move. These properties can be inherited by transitive inference, such as the following expressions: a robin can fly, a robin breathes, a robin can move, a bird can move, and so forth. There is some evidence that these inferred expressions take a bit more time to judge as true or false than the direct expressions (Collins & Loftus, 1975).

### 5.3.2 Spatial Structures

Spatial structures have a hierarchy of regions that are connected by *is-in* arcs (or the *inverse* contains relation). As shown in Figure 5.1, Los Angeles is-in California, San Diego is-in California, Reno is-in Nevada, California is-in the western US, Nevada is-in the western US, and the western US is-in the USA. From these, we can derive via transitivity the following inferences: Los Angeles is in the western US, San-Diego is in the western US, Reno is in the western US, Los Angeles is in the USA, and so on. The locations within each region can also be connected by relational arcs that specify north, south, east, and west. We see in Figure 5.1 that Los Angeles is north-of San-Diego and California is west-of Nevada. We can infer by transitivity that San Diego is west of Reno. Most of these transitive inferences are correct when we look at actual maps. However, these inferences are not always correct (Stevens & Coupe, 1978). For example, San Diego is actually east of Reno rather than west of Reno according to an actual map. Similarly, Seattle is actually north of Toronto and El Paso is actually west of Denver. Knowledge structures and these transitive inferences are often accurate, but sometimes generate some interesting errors. The knowledge structures also can to some extent predict biases in distance. For example, distances between cities within a region can also, to some extent, seem closer than distances between cities from different regions. The distance

from Memphis to Jackson, Tennessee seems closer than to Jackson, Mississippi, yet the actual distance is the opposite.

# 5.3.3 Goal-action Procedures

Goal-action procedural structures are organized into a hierarchy of nodes connected by "in order to" arcs. The nodes refer to goals or desired states that are organized hierarchically and that guide a sequence of actions that achieve the goals if the procedure is successfully performed. Imagine you have a goal of eating at a restaurant. The structure in Figure 5.1 shows how this could be accomplished. In order to eat at the restaurant, you need to get to the restaurant and order your food. In order to get to a restaurant, you need to drive your car and look for the restaurant. This specific knowledge structure in Figure 5.1 does not require careful deliberation to plan and execute. The procedure becomes a routine through experience and repetition. It would be exhausting to plan through problem solving for each step of every goal-action procedure you carry out throughout the day. However, such problem solving (see Chapter 9) is needed when a person visits another country.

The structure in Figure 5.1 is taken from the perspective of one person who needs food. However, there are other people who have their own agenda, such as the cook and the person at the counter. A script is a structure that considers all of the people who participate in the organized activity of a restaurant (Bower, Black, & Turner, 1979). The cook, the person at the counter who collects money, and the customer all have their own goal structures and perspectives. The script also has taxonomic structures (cook → employee → person) and spatial structure (table → restaurant → building).

These goal-action procedures and script structures explain a number of psychological phenomena. Each goal-action node is broken down into subordinate nodes that become much more detailed in the activity. People tend to forget the lower-level details of the actions and procedures (Bower et al., 1979), which are often automatized from repetition and experience (see Chapter 13, "Expertise"). People tend to notice obstacles to goals being accomplished and may become frustrated, as everyone who has waited for many minutes trying to order food at a counter knows. When people visually observe scripts being enacted, they tend to notice event boundaries (i.e., junctures, separations) after a goal is achieved/interrupted, when there is a new spatial setting, and when a new person enters a scene (Zacks, Speer, & Reynolds, 2009). When people read stories, sentences take more time to read when they introduce new goals, spatial settings, and characters (Zwaan & Radvansky, 1998). These structures also explain answers to questions. When asked, "Why do you go to a restaurant?", a good answer would go up the structure (in order to eat food) but not down the structure (in order to drive). When asked "how do you go to a restaurant?", a good answer would be down the structure (you drive) but not up the structure (you eat). Organized structures like these explain a large body of data involving neuroscience, cognition, behavior, emotion, and social interaction.

# 5.3.4 Causal Networks

Causal networks can be used to answer the question, "What causes something to occur?" For example, one could use causal networks to show the chain of events that cause a volcanic eruption, cancer, the winner of an election and other phenomenon in physical, biological, and technological systems (van den Broek, 2010). In a causal network, nodes represent events (or states, or processes) whereas arcs point from one node to another if an event causes or enables another event. For example, in Figure 5.1, we have a causal network showing how heart disease can be a result of a causally driven chain of events. Some of these events are inspired by sociological factors (getting a divorce) and psychobiological factors (smoking), whereas other events are entirely products of biological systems (hardening of the arteries). The events in the causal system that are linked through *enables* arcs convey a weak sense of causality, while the *causes* arcs indicate a stronger sense of causality. Causal networks are complex. They are not strictly hierarchical or follow a linear order but can have many paths of connections and loops.

The structures in Figure 5.1 are very systematic, organized, and conceptually precise. The mental structures are not that neat and tidy. One approach to help people learn is to have them construct such graphs during or after they comprehend text, digital environments on the internet, conduct an experiment, or perform some other activity. The activity of constructing these conceptual graphs can help them learn a subject matter even though they are not likely to generate neat and tidy structures. Available research has also revealed that nodes that are more central in the structure (i.e., many arcs radiate from them) are more important and better remembered (Bower et al., 1979; van den Broek, 2010).

# 5.4 Associative Representations of Knowledge

According to classical associationism, ideas vary in how strongly associated they are with each other. That is no doubt true, but the deep secret lies on what can predict the strength of association. A word like "evil" has likely strong associations to words like "bad" (a functional synonym), "good" (an opposite), "Halloween" (an event), "Knieval" (part of the phrase evil Knievel, the dare devil), and "devil" interesting etymology), but not to words like "smooth", "birthday", and "Michael Jordan."

What makes associations strong versus weak? Strength of repetition is clearly one factor. The strength of association between ideas increases with the frequency of the ideas occurring together at the same time and location. Another prediction is the similarity of the ideas. The strength of association between two ideas is stronger to the extent they are similar in meaning. Positive outcomes is yet another prediction: two ideas have stronger association to the extent that they lead to positive outcomes (a reward, a solution) rather than negative outcomes (punishment, failure). In summary, repetition, similarity, and reinforcement are major predictions of the strength of association between two ideas.

These principles of associationism have been known for at least two centuries. They are deeply entrenched in modern cognitive models of perception, categorization, memory, judgment, and other auto-

Figure 5.2: A neural network with an input layer, two hidden layers, and an output node.

mated processes of cognition. Neural networks are a noteworthy class of models that implement associationism (McClelland & Rumelhart, 1987). Figure 5.2 presents an example of a neural network. A neural network is a structure of nodes (analogous to neurons) in multiple layers that are interconnected by directed, weighted arcs that potentially activate the nodes (positive weights) or inhibit the nodes (negative weights). A node is fired (all-or-none) if the arcs that feed into it receive enough activation, with the sum of the activation being stronger than the inhibition.

In order to illustrate the mechanisms of a neural network, consider a neural network that detects whether or not a person's face shows confusion. The input layer of nodes would correspond to states, events, or processes on parts of the face at particular positions. For example, the right eyelid opens wide, the mouth opens wide, or the left corner of the lip contracts. Ekman and his colleagues developed a facial action coding system that defines these features for those who investigate facial expressions (Ekman & Rosenberg, 2005). The output node is activated if the set of activated input node features show a pattern of confusion, but otherwise it is not activated. There may also be one or more hidden layers of nodes that refer to intermediate states, events, or processes. Exactly what these hidden nodes refer to is not necessarily clear-cut and easy to interpret. They could refer to higher order categories, such as the overall amount of movement, positive versus negative emotions, upper face parts versus lower face parts, or angle of perspective. The hidden layers and nodes within these layers are statistically derived characteristics that depend on a long history of experiences that the individual person has had. It is important to emphasize that these neural networks learn from experience. The nodes and arcs are strengthened or otherwise altered with each experience. The networks capture the associationist principles of repetition, similarity, reinforcement, and contiguity of events in time and space.

Today neural networks are frequently used in machine learning and artificial intelligence to enable computers to perceive people, objects, events, and scenes, to guide robots in completing routine tasks, and to solve some types of problems. In this "deep learning" revolution, massive amounts of experiences are fed into the computer during training of the neural network, far more than a single person would ever receive. As a consequence, the computer outperforms humans in precisely defined tasks. This has the potential to threaten the workforce for some jobs that humans traditionally perform (Elliot, 2017). These neural networks can handle only specific tasks, however. A neural network for detecting confusion would not be of much use to detect surprise or boredom – they cannot generalize and

transfer to other tasks. Nevertheless, it is widely acknowledged that generalization and transfer are also very difficult for humans to accomplish (Hattie & Donoghue, 2016). Perhaps the human mind is little more than a large collection of these specialized neural networks. This is a debate in the cognitive and learning sciences.

Another example of associative knowledge representations is latent semantic analysis, LSA (Landauer, McNamara, Dennis, & Kintsch, 2007). LSA is a statistical representation of word knowledge and world knowledge that considers what words appear together in documents, such as articles in books, speeches, conversations, and other forms of verbal communication. According to LSA, the meaning of a word depends on the other words that accompany it in real-world documents. The word *riot* often occurs in the company of other particular words in documents, such as *crowd*, *dangerous*, *protest*, *police*, and *run*. These words do not always occur with the word riot of course, but they do with some cooccurrence probability. These probabilities of words with other words define a word's meaning, which is very different than word meanings in a dictionary or thesaurus. LSA has been found to predict data in many cognitive tasks such as priming (a word automatically activates another word), judgments of sentence similarity, inferences, and summarization of text (Landauer et al., 2007). LSA has also been used in computer systems that automatically grade student essays (Foltz, 2016) and tutor them in natural language (Graesser, 2016).

# 5.5 The Body in Cognition

Proponents of embodied cognition believe that mental representations are shaped and constrained by the experience of being in a human body. Our bodies influence what we perceive, our actions, and our emotions. These embodied dimensions are often incorporated in representations when we comprehend text (Zwaan, 2016) and influence how we learn (Glenberg, Goldberg, & Zhu, 2011). Embodied representations are constructed, for example, when you read a novel and get lost in the story world. There is a rich mental model of the spatial setting, the ac-

tions performed by characters, and their emotions. Your experience is similar to watching a movie or acting the parts yourself. Mental representations are often colored with perceptual images, motoric actions, and visceral emotions rather than being abstract conceptualizations. The meaning of abstract concepts (such as love) is often fortified by these dimensions of perception, action, and emotion, such as visual image of a wedding cake, a dance, or a first kiss (Barsalou & Wiemer-Hastings, 2005). There is substantial evidence that memory is improved for verbal material when learners construct visual images in their mind (Clark & Paivio, 1991) or they perform actions associated with the content.

The importance of embodied cognition in comprehension is obvious when you go someplace new and ask for directions to a specific location, such as the city hall. When you ask a stranger, "Where is the city hall?" the helpful stranger nearly always points in the right direction and launches several sentences with landmarks, paths, and left-right-straight comments, typically accompanied by hand gestures. You get confused by the second sentence but politely nod. Then you follow the suggested direction and soon ask the next person. The problem is that there is very little shared knowledge between you and the stranger so you have no foundation for constructing a precise embodied path to the destination. Embodied representations are necessary for precise comprehension of important messages about the physical, social, and digital worlds.

The importance of embodied representations on reading comprehension has been confirmed in the *Moved by Reading* program (Glenberg, Goldberg, & Zhu, 2011). Readers who struggle with reading comprehension experience difficulty constructing an embodied representation of the text. Suppose that students read a text about events that occur at a tea party. This would be difficult to imagine if they had no knowledge or experience with tea parties. In *Moved by Reading*, the student is presented with an image of a tea set on a computer screen and then asked to act out a story on the content by pouring tea, sipping tea, and performing other actions conveyed in the story. Students are also later asked to imagine acting out the story so they will internalize the strategy of constructing a mental model of the

text. When compared to students who were asked to simply reread the text, the students who were asked to imagine manipulating the objects showed large gains in comprehension and memory. One of the interesting research questions is whether it is better to physically perform the actions compared to digitally moving images on a computer screen or to imagine performing actions in the mind.

## 5.6 Conversations

People have learned by observing and participating in conversations throughout most of the history of personkind, especially prior to the invention of the printing press and computer technologies. The secrets of family life and a person's livelihood were learned by holding conversations with members of a family, a tutor, a mentor, a master, or a group of people participating in the practical activities. Knowl-

edge representations are to some extent shaped by these conversations that are observed, enacted, remembered, or otherwise internalized in the mind (Vygotsky, 1978). Texts that are written in the style of stories and oral conversation are read faster, comprehended better, and remembered better than technical text that is distant from conversation.

There is also solid evidence that one-on-one human tutoring helps to learn subject matter in courses more than simply listening to lectures or reading texts (Cohen, Kulik, & Kulik, 1982; VanLehn, 2011). The individual tutor can find out the problems the learner is facing, provide hints or direct assertions on helping them, and answer their questions. Researchers have developed intelligent tutoring systems that simulate human tutors (VanLehn, 2011), including some systems like AutoTutor that hold conversations with the student in natural language (Graesser, 2016). These systems help students learn subject matters like computer literacy, physics, and

Figure 5.3: This is a screenshot showing pedagogical agents used in an intelligent tutoring system (D'Mello, Lehman, Pekrun, & Graesser, 2014). In this example, the tutor agent, Dr. Williams is on the left of the screen, and the peer agent, Chris, is on the right of the screen. Reprinted from Learning and Instruction, 29, D'Mello, S., Lehman B., Pekrun, R., & Graesser, A.C. Confusion can be beneficial for learning. 153-170. ©(2014), with permission from Elsevier.

scientific reasoning about as good as human tutors, both of which are better than conventional training methods like reading texts and listening to lectures.

A promising approach to establish deeper knowledge representations is to plant contradictions and information that clashes with prior knowledge to the point of the learner experiencing cognitive disequilibrium. Cognitive disequilibrium occurs when people face obstacles to goals, interruptions, contradictions, incongruities, anomalies, impasses, uncertainty, and salient contrasts. Cognitive conflicts can provoke information-seeking behavior, which engages the learner in inquiry, reasoning and deep learning. Learning environments with computer agents have been designed to stage contradictions and debates, thereby inducing cognitive disequilibrium (D'Mello, Lehman, Pekrun, & Graesser, 2014). These studies had tutor and peer agents engage with the student in conversational trialogues while critiquing research studies in psychology, biology, and chemistry. An example screenshot is shown in Figure 5.3. Most of the research studies had one or more flaws with respect to scientific methodology.

For example, one case study described a new pill that purportedly helps people lose weight, but the sample size was small and there was no control group. During the course of the three-way conversation, the agents periodically expressed false information and contradictions. Disagreements between the agents and with what the student believed tended to create cognitive disequilibrium, confusion, and disagreement. During the course of the trialogue conversation, the agents periodically asked students for their views (e.g., "Do you agree that the control group in this study was flawed?"). The students' responses were coded on correctness and also the vacillation in making decisions when asked a question multiple times throughout a conversation. There were also measures of confusion. The correctness and confusion scores confirmed that the cognitive disequilibrium that resulted from contradictions improved learning, particularly among the students who had enough knowledge and thinking to be confused. That is, the experience of confusion, a signal of thinking, played an important role in the deep learning.

Table 5.1: Key affordances of learning technologies (National Academy of Sciences, Engineering, and Medicine, 2018). ©National Academies Press. Reprinted with permision. https://www.nap.edu/catalog/24783/how-people-learn-ii-learners-contexts-a nd-cultures


# 5.7 Importance of Media and Technology in Knowledge Representation and Learning

Theories of distributed cognition assume that the mind is shaped and constrained by the physical world, technologies, and other people in their environment (Dror & Harnad, 2008; Hutchins, 1995). An expert problem solver in a distributed world needs to assess whether a technology, a social community, the external physical world, or his/her own analytical mind is best suited for achieving particular steps in solving challenging problems. Judgments are involved in the decisions you make when you decide whether to trust your own an-

alytical judgment, the output of a computer program, or a decision of a group. There are questions such as "Should I write down on a piece of paper the groceries I need to buy or try to memorize them?"; "Should I compute this square root by hand or use a calculator?"; "Should I ask my friends where to on vacation or decide that for them?" These are decisions in a distributed world.

Media and technology play a central role in shaping cognitive representations in a distributed world. It is important to take stock of how they do so. Old-school media consisted of listening to lectures, watching video presentations, and reading books. For these media, the learners passively ob-

Table 5.2: Mayer's (2009) Principles to Guide Multimedia Learning. Adapted from NAESM (2018). With permission from National Academy of Sciences, Engineering, and Medicine, 2018. ©National Academies Press. https://www.nap.edu/catalog/24783 /how-people-learn-ii-learners-contexts-and-cultures


serve or linearly consume the materials at their own pace. However, the learning environments in today's world require learners to be more active by strategically searching through hypermedia, constructing knowledge representations from multiple sources, performing tasks that create things, and interacting with technologies or other people (Chi, 2009; Wiley et al., 2009). From the standpoint of technology, it is worthwhile taking stock of the characteristics of learning environments that facilitate active, constructive, interactive learning environments. Table 5.1 shows some of these characteristics that were identified by the National Academy of Sciences, Engineering, and Medicine in the second volume of *How People Learn* (NASEM, 2018). It is important to consider these characteristics when selecting technologies to support the acquisition of knowledge representations in different subject matters, populations, and individual learners. All of these characteristics have been implemented in learning technologies and have shown some successes in improving knowledge representations and learning.

Unfortunately, there is an abundance of commercial technologies that are not well designed, are not based on scientific principles of learning, and have no evidence they improve learning. There are many bells and whistles of multimedia in so many products (a lot of razzle dazzle), but under the hood

there is no substance in helping people learn and build useful knowledge representations. We live in a world replete with games and social media that contribute to shallow rather than deep knowledge representations.

It is important to consider the characteristics of the learning technologies that support deeper knowledge representations and learning (Millis et al., 2019; NASEM, 2018). Mayer (2009) has also identified 12 principles of multimedia learning that improve knowledge representation and acquisition (see Table 5.2). These principles are all based on psychological theories and confirmed by data collected in experiments.

The hope is that stakeholders and policy makers in education encourage learning environments which support knowledge representations needed in the 21st century. Citizens in the 21st century are faced with complex technologies, social systems and subject matters (National Research Council, 2012; Levy & Murnane, 2006). Mastery of facts and routine procedures are necessary, but not sufficient for participation in a world that demands deeper comprehension of technical material and more complex problem solving, reasoning, information handling and communication. Understanding the nature of knowledge representations will be extremely important in meeting this challenge.

#### Summary


different types of relations (e.g., is-a, has-a, contains, causes). Four example knowledge structures were discussed: taxonomic, spatial, causal, and goal-action procedures.


#### Review Questions


#### Hot Topic

Art Graesser

Our research, along with colleagues in the interdisciplinary Institute for Intelligent Systems, investigates language, discourse and learning. Our primary focus is on the mastery of deep knowledge rather than shallow knowledge in adults. Examples of shallow knowledge are facts, definitions, and routine procedures, whereas deep knowledge involves causal reasoning, justification of claims with evidence, resolution of contradictions, precise quantification of ideas, and problem solving (Graesser, 2015). The workforce in the 21st century has an increased expectation to acquire deep knowledge to the extent that routine tasks are handled by robots and other digital technologies. Unfortunately, the process of deep learning is challenging because the material is difficult, useful strategies are sometimes novel, and some of the accompanying emotions are negative (such as confusion and frustration, D'Mello, Lehman, Pekrun, & Graesser, 2014). Moreover, our current educational systems are

typically designed for acquiring shallow knowledge rather than deep knowledge.

One approach to acquiring deep knowledge is to develop computerized intelligent tutoring systems that help adults acquire deep knowledge. These systems have pedagogical strategies that are tailored to the knowledge, skills and abilities of individual students. We have developed a system called AutoTutor (Graesser, 2016), where a student learns by having conversations with animated conversational agents (computer-generated avatars). AutoTutor presents difficult questions or problems, often with associated figures and diagrams; the student and AutoTutor have a multiturn conversation to co-construct an answer/solution. AutoTutor has

been developed and tested on a number of difficult subject matters, such as computer literacy, physics, electronics, scientific reasoning, and comprehension strategies. These conversational ITS have shown significant learning gains on deep knowledge compared with pretests and control conditions such as reading text. Some versions of AutoTutor implement "trialogues" that involve a conversation between the student and two computer agents, a tutor and a peer (Graesser, Li, & Forsyth, 2014). The two agents can model good social interaction, productive reasoning, and at times argue with each other to show different perspectives and resolutions of conflicts (D'Mello et al., 2014).

We have investigated other approaches to improve deep learning through language and discourse (Graesser, 2015). These include investigating inference generation and mental models during the comprehension of stories, technical text, illustrated texts, hypertext, and hypermedia. We have developed computer systems (available on the internet for free) that scale texts on difficulty (Coh-Metrix, http://cohmetrix.com) and questions on comprehension problems (QUAID, http://quid.cohmetrix.com). We have investigated collaborative problem solving where groups of people in computer-mediated communication tackle problems that individuals cannot solve alone. A curriculum for 21st century skills is destined to include discourse technologies that facilitate deeper knowledge acquisition.

Keith T. Shubeck

Anne M. Lippert

#### References


Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. *Current Directions in Psychological Science*, *23*, 374–380. doi:10.1177/0963721414540680

### References


disciplinary literacy. *Educational Psychologist*, *51*(2), 219–246. doi:10.1080/00461520.2016.1168741


*ucational Research Journal*, *46*(4), 1060–1106. doi:10.3102/0002831209333183


# Glossary


# Chapter 6

# Metacognition: Monitoring and Controlling One's Own Knowledge, Reasoning and Decisions

KLAUS FIEDLER, RAKEFET ACKERMAN & CHIARA SCARAMPI

University of Heidelberg, Israel Institute of Technology & University College London

# 6.1 Introduction: What is Metacognition?

# 6.1.1 Setting "Metacognition" Apart from "Cognition"

Metacognition is the "top manager" of cognitive functioning. Memory, for instance, consists of the basic cognitive functions for storing and retrieving information. Metacognitive processes are responsible for regulating these functions: setting goals for learning, examining the quality of memory storage and retrieval, allocating time to memory processes, choosing among strategies for reasoning, making decisions, and acknowledging achieving goals. Metacognition is not separate from cognition, but integral to all higher-order cognitive inferences, including explicit learning, skill development, recall of personal events, communication, decision making, problem solving, navigation, design, etc. It refers to the superordinate and in a way to the most responsible level of all cognitive functions. It constitutes the quality control of one's own mental functions.

The prefix "meta" in Greek loanwords denotes "something that consciously references or comments upon its own subject" (https://www.dictionary.co

m/). Thus, metacognition is cognition about one's own cognition. It serves to *monitor* the correctness of our cognitive operations and to *correct* for incorrect operations in order to control for the costs and benefits of our judgments and decisions (Nelson & Narens, 1990). To illustrate, an invoice must be checked (monitoring) and corrected for potential calculation errors (control). Before a written exam can be submitted, all responses must be validated (monitoring) and revised if necessary (control). Purchasing decisions must be confirmed (monitoring) or revised in case of dissatisfying expected results (control).

# 6.1.2 Metacognitive Monitoring and Control

The output of the metacognitive monitoring function provides the input to the metacognitive control function (Nelson & Narens, 1990). Monitoring judgments, the critical assessment of the mental operations used to transform the stimulus information, are preconditions for appropriate corrections and for any decisions or actions. Thus, the veracity of verbal communications has to be assessed critically before one can decide whether to trust, distrust or discard the communication. One monitors the navigation

of one's car or boat in order to draw a controlled decision at the next branching. Or, monitoring the money one has spent on prior occasions affords a precondition for the controlled use of the remaining budget.

Metacognition is ubiquitous because virtually all cognitive operations are monitored and controlled, before, during, and after their execution. The execution of an action plan—such as telling a story about what we did last weekend—is not confined to retrieval and speech activities; it also involves monitoring operations such as keeping track of the position reached in the story, checking grammar and pronunciation, assessing the available time left, receiving signals from communication partners, or noting the ease (or difficulty) with which story details come to mind (e.g., "I don't recall the name now, it will probably come to mind soon"). As a function of these monitoring results, one can then control speed and story detail, correct for mistakes, secure comprehension, and maybe change one's nonverbal behavior in order to appear honest.

Figure 6.1 provides a schematic overview of generic monitoring and control functions involved in different stages of cognitive processing, from ac-

quisition to retention, retrieval, and inferences leading to judgments and decisions. It is an extended version of a diagram that was originally presented in a seminal article by Nelson and Narens (1990), which focused on memory processes. As apparent from the direction of arrows, monitoring functions are informed by the contents of the primary cognitive processes, whereas control functions constitute metacognitive influences exerted on the cognitive processes, informed by monitoring results.

Metacognition covers both meta-memory and meta-reasoning (see Ackerman & Thompson, 2015, 2017). That is, monitoring and control functions are not only concerned with memory proper but also with memory-dependent reasoning processes leading to judgments and decision making. Thus, a cognitive-ecological perspective on judgment and decision calls for an extended metacognitive approach, which must not only regulate internal cognitive functions but also check on the validity and usability of environmental information samples. In this regard, Figure 6.1 indicates that for judgments and decisions to be unbiased and accurate, the weight given to sampled information must depend on a critical assessment of its validity and trustworthiness.

Figure 6.1: Schematic overview of major monitoring and control functions, based on Nelson and Narens (1990).

# 6.2 Review of Insights Gained from Metacognition Research

In a review of four decades of pertinent research (see Kornell & Bjork, 2007; Son & Sethi, 2010), some milestones can be identified. The general research theme is the interplay of monitoring and control (Nelson & Narens, 1990), which need not be strictly unidirectional (see Koriat, Ma'ayan, & Nussinson, 2006). Yet, only when monitoring is reliable can people have a solid basis for effective control of strategies and allocation of effort and resources.

# 6.2.1 Metacognitive Regulation of Effort

Imagine Lisa, a student, who is studying a chapter in a text book for an exam. While reading the chapter, Lisa is considering her proficiency and decides whether to restudy a previous paragraph, look for additional information over the Internet, continue to the next paragraph, or stop studying, either because she does not progress adequately today, or because she knows the entire chapter to a satisfactory degree. All these regulatory functions rely on monitoring her knowledge of each paragraph in the chapter. This assessment allows Lisa to identify the weak points and those which she mastered already.

The available empirical evidence on effort regulation was to a large extent collected by simple methodologies involving memorized lists of words or word pairs. Nevertheless, the scientific insights gained from these paradigms are robust and generalizable to many other cognitive tasks (e.g., solving problems, answering knowledge questions, learning from texts, decision making). In a typical pairedassociate memory study, people are asked to memorize pairs of related or unrelated words (e.g., KING – CROWN; FLAG – POT) presented one after the other. They are allowed to allocate time to each item freely. Immediately after memorizing each word pair, people assess their chance for success by providing a Judgment of Learning (JOL). For adults, the tasks typically involve memorizing 60 word pairs presented in a random order. After memorizing all of them, there is a recall phase, in which the left words are presented one by one in a new

order, and participants are asked to recall the right word that was attached to it in the study phase. Analyses of study time, JOL, and recall success provide evidence about the way people allocate study time across items and in various conditions (e.g., high motivation for success; repeated learning; emotionally loaded vs. neural words; words presented in large or small fonts).

The causal role of JOL for effort regulation was established by Metcalfe and Finn (2008). They asked participants to learn half of the word pairs once and half repeated three times in the study list. Participants then provided their JOLs and recalled the items. Not surprisingly, JOL and recall were higher for the items learned three times than for those learned only once. In a second block, the items studied once in the first block were now presented three times, and vice versa. All items were thus learned four times altogether and recall of both sets was equivalent. However, JOLs were higher for items learned three times than for those learned only once in the first block, presumably because of the advantage in the initial recall test after the first block. This effect of a previous test on JOL is called *memory for past test*. Most relevant for effort regulation is that when providing JOL for the second block, which differed between the item sets, participants were also asked whether they would like to restudy each item. Although recall performance was equivalent for both item sets, participants chose to restudy items for which JOL was lower—those studied only once in the first block. This finding demonstrates that effort regulation decisions, like decisions to restudy items, depend on JOL rather than on actual memory strength. Similarly, people relied on JOL when betting on success, even when these judgments were misleading (Hembacher & Ghetti, 2017).

Using more complex learning and memory tasks, Thiede, Anderson, and Therriault (2003) found that judgments of comprehension guide decisions to restudy texts. When these JOLs were more reliable, participants were better attuned to their knowledge level and chose to restudy the less well-known texts. This strategy led to higher achievement, demonstrating that effort regulation becomes more effective with more reliable JOLs. For visual perception, subjective confidence guided decisions to get a hint that was helpful for choosing among two options (Desender, Boldt, & Yeung, 2018). Notably, the tight association between monitoring and control was reduced among clinical populations and enhanced among young and healthy people (e.g., Danion, Gokalsing, Robert, Massin-Krauss, & Bacon, 2001; Koren et al., 2004). Thus, a well-functioning monitoring-control link should not be taken for granted.

The next question to ask is when people stop investing effort. That is, what are the stopping rules that guide effort regulation? A regular finding is that people invest more time in studying the more difficult items (Zacks, 1969). This finding led to the development of *Discrepancy Reduction Models*, which assume that people set a target level according to their motivation in the given scenario. The target acts as a stopping rule: they study each item until monitoring indicates that their knowledge of this item is satisfactory (Nelson & Narens, 1990; see Figure 6.2). For more difficult items (B in Figure 6.2) this takes longer than for easier items (A). There are conditions, such as time pressure, under which the stopping criterion gets lower, reflecting a

compromise in the target level of knowledge (Thiede & Dunlosky, 1999). High motivation for success, in contrast, leads people to raise their stopping criterion, yielding longer time investment aiming to increase the chances of success (Koriat et al., 2006).

As known from real-life scenarios, when the items to be studied are extremely difficult, people may give up early, even when they acknowledge that they do not know them as they would have desired. This strategy is effective since it reduces labor-in-vain: time investment in items that have a low chance of being mastered, even after extensive effort to master them. Moreover, this strategy allows more time to be invested in other items, at intermediate difficulty levels, which have a higher chance of being mastered (Son & Sethi, 2010). Indeed, it was shown that people compromise on their target level as more time is invested. They also set a time limit, beyond which they are not willing to invest further time in studying an item (Ackerman, 2014). This time limit is adjusted to be higher when learners have high motivation and to be lower when they learn under time pressure (Undorf & Ackerman, 2017).

One more consideration is order effects. Dunlosky and Ariel (2011) demonstrated that, when pre-

Figure 6.2: Illustration of the discrepancy reduction model, based on Ackerman and Goldsmith (2011, Figure 1). It shows the straight criterion and the regulatory role of Judgment of Learning (JOL) in guiding the decision whether to continue or cease learning. A – early termination with overconfidence, B – termination with perfect calibration, C – point of decision to continue learning because the stopping criterion was not reached yet.

sented with several items, people tend to choose to restudy items encountered earlier in their habitual reading order (e.g., from left to right) rather than those appearing later. When study time is too short to master all materials, investing too much in early list parts is counterproductive, relative to waiving the most difficult items: the time invested in the difficult items, when they appear early in the list, could be used more effectively for studying easier items appearing later in the list. Beyond this order effect, Dunlosky and Ariel (2011) also found indications for waiving the most difficult items. Thus, these strategies are complementary rather than mutually exclusive.

Generalizing these principles to text learning, Ackerman and Goldsmith (2011) compared learning printed texts to learning the same texts presented on computer screens. In both cases, participants were allowed to write comments and highlight text sections. In the computerized condition, participants believed to learn more quickly than on paper, and thus stopped learning earlier (see Figure 6.2, point A). In fact, though, rate of learning was equivalent in both media. As a result, performance in tests taken immediately after studying was respectively lower in the computerized than in the printed-text condition. This apparently reflects the role of overconfidence in effort regulation—people stop when they think they know the materials adequately. If they are overconfident, stopping will be premature. Later studies showed that learning in computerized environments suffers most from limited learning time (for a meta-analysis, seeDelgado, Vargas, Ackerman, & Salmerón, 2018). Similar overconfidence effects were found with problem-solving tasks of the types students encounter in math, logic, geometry, and psychometric tests (Ackerman, 2014; Sidi, Shpigelman, Zalmanov, & Ackerman, 2017).

# 6.2.2 The Heuristic Bases for Metacognitive Judgments

The metacognitive judgments regarding memory, reading comprehension, and solutions to problems introduced in the preceding section are known to be based on heuristic cues (see Dunlosky & Tauber, 2014, for a review; Koriat, 1997). Thus, people cannot directly "read" their knowledge and the quality of their own cognitive processing, but instead, must base their judgments on cues experienced when they perform the task and immediately after stopping performing it.

One prominent cue is *fluency*—the subjective ease with which a cognitive task is performed. Fluency is accounted to underlie many metacognitive judgments; it is indeed a rather valid cue for success. For instance, memorizing the word pair TUBER – AZORES is hard as the words are rarely encountered and their pairing is rather unusual. When memorizing this word pair among sixty other pairs, the chances of remembering the right word when encountering the left one remains low despite investing a lot of effort, which means that this item's fluency is low. In contrast, when a pair consists of familiar words which are often encountered in the same context (e.g., SOCK – FOOT), cued recall is typically quick and has a high chance of success, and thus characterized by high fluency. Koriat, Ma'ayan, and Nussinson (2006) suggested that people use in such contexts a *memorizing effort heuristic*: longer learning times, experienced as lower fluency, indicate a lower probability of memorizing the item later.

The predictive accuracy of metacognitive judgments depends on the diagnosticity of the utilized cues. A great deal of research focused on conditions under which heuristic cues, like fluency, can be misleading. For instance, people may feel that they found the correct solution for a problem right away and based on fluency be confident they solved it successfully, while in fact they are wrong, and investing more effort could increase their chance of success. Thus, identifying factors that induce predictable biases in people's confidence is important because such biases impair effort regulation.

The potentially misleading impact of heuristics suggests that metacognitive judgments are dissociable from the actual success of cognitive processes; factors that affect performance do not necessarily affect judgments regarding the same cognitive processes, and vice versa. In particular, dissociation of JOL from actual performance can stem from surface properties of the to-be-learned items affecting perceptual fluency rather than the more relevant cue of processing fluency. Rhodes and Castel (2008) found

higher JOLs for words printed in large font than for those printed in smaller fonts, although recall was less affected by font size (see Undorf, Zimdahl, & Bernstein, 2017, for a similar perceptual influence on JOL). Conversely, other variables have more pronounced effects on performance than on JOLs. For instance, rehearsal improves recall, and long delays between learning and test cause substantial forgetting, yet JOLs are hardly sensitive to either (Koriat, 1997; Koriat, Bjork, Sheffer, & Bar, 2004). Thus, accuracy of JOLs and other metacognitive judgments depends on the validity of the utilized cues.

An effective and easy-to-adapt solution to several biases of JOLs is to delay the JOL elicitations to a time closer to the test, rather than immediately after learning. The *delayed JOL effect* is robust (see Rhodes & Tauber, 2011, for a meta-analysis). Delayed JOL accuracy reflects access to more diagnostic heuristic cues from long-term memory reflecting better the state of knowledge when taking the test than when provided immediately after learning each item.

In the context of problem solving, Ackerman and Zalmanov (2012) compared performance and confidence in the solutions of multiple-choice and openended test format. As expected, they found higher success rates in a multiple-choice test format than in the open-ended test because of guessing or identifying the correct option when readily available. However, subjective confidence ratings were equivalent in both test formats; they did not reflect this performance difference. Confidence in the same solutions was however sensitive to response time: lower for slow responses than for quick responses. This finding reflects utilization of fluency. Similarly, Fernandez-Cruz, Arango-Muñoz, and Volz (2016) found sensitivity to processing fluency for both feeling of error and final confidence in a numerical calculation task. Thompson and colleagues (2013) examined fluency effects on final confidence and on Feeling of Rightness (FOR)—an initial confidence judgment collected immediately after producing the first solution that comes to mind, and before rethinking the solution. They used misleading math problems and considered both processing fluency, based on ease of processing, and perceptual fluency, manipulated by font readability (e.g., hard vs. easy

to read fonts). Both FOR and final confidence reflected processing fluency, as both judgments were associated with response times. However, none of the examined judgments reflected perceptual fluency, unlike the aforementioned font-size effects on JOL. This example of a difference between metacognitive judgments of memory processes and of reasoning processes suggests that research should delve into commonalities and differences across tasks (Ackerman & Beller, 2017; Ackerman & Thompson, 2015, for a review).

Convincing evidence for the role of fluency in judgments, as reflected by response time, was provided by Topolinski and Reber (2010). Using three different types of problems, they first presented each problem and then, delayed either for a short or longer time, presented a potential answer, which was the target stimulus. Participants had to judge whether the presented answer was the correct solution for the presented problem. For both correct and incorrect candidates, faster appearing solutions were more frequently judged to be correct than those presented after a delay. Because solution display time was the only difference, the findings indicate that mere delay led to lower endorsement of answers as correct.

Two other heuristic cues were shown to affect feelings-of-knowing regarding answers to knowledge questions. The first cue is the familiarity of the question terms or the knowledge domain (e.g., Reder & Ritter, 1992; Shanks & Serra, 2014). The second cue is accessibility, which reflects the number of associations that come to mind during a retrieval attempt, regardless of whether this information promotes retrieval of correct answers (Koriat, 1993). For example, Koriat and Levy-Sadot (2001) composed general knowledge questions that differed in familiarity of the terms (e.g., the ballets "Swan lake" vs. "The Legend of Joseph") and in accessibility, operationalized as the number of names people can provide for a category (e.g., people tend to know more composers than choreographers). These cues contributed independently to feeling-of-knowingjudgments, which were higher for more familiar objects, especially when items were highly accessible. Accessibility also affected judgments regarding problem solutions (Ackerman & Beller, 2017). Although not necessarily reflected in response time, it is possible that familiarity and accessibility affect fluency by affecting the ease of processing experience.

Focusing on a rarely considered cue, Topolinski, Bakhtiari, and Erle (2016) examined the effects of ease of pronouncing on judgments of solvability quick assessment as to whether the problem is solvable—has a solution—or whether it includes contradiction that does not allow one to solve it at all. Topolinski and colleagues presented participants with solvable anagrams (scrambled words) and unsolvable letter sets, that could not be rearranged to form a valid word, and manipulated their pronounceability. For instance, for the word EPISODE, they had two anagram options: EDISEPO and IPSDEOE. Easy- and hard-to-pronounce versions also existed for the unsolvable letter sets. As expected, easyto-pronounce anagrams were more often rated as solvable than hard-to-pronounce anagrams, regardless of whether anagrams were in fact solvable or not. This finding is particularly interesting because in reality anagrams that are easier to pronounce are often harder to solve, since people find it harder to rearrange their letters. Thus, pronounceability may function as a misleading heuristic cue for metacognitive judgments.

Most heuristic cues considered in memory and reasoning research somehow refer to semantic knowledge activated in verbal tasks. This is the case with relatedness of word pairs, familiarity of question terms, accessibility of relevant knowledge, and pronounceability, as reviewed above. Studying heuristic cues that affect perceptual decisions provides opportunities to consider non-semantic heuristic cues. In a study by Boldt, De Gardelle, and Yeung (2017) participants judged the average color of an array of eight colored shapes and rated confidence in their choice. The higher the variability of colors across the eight shapes, the lower the participants' confidence in the average color choice, even when equating the actual difficulty. Thus, people utilize misleading heuristic cues in perceptual tasks as they do in verbal tasks.

When considering the bases for metacognitive judgments, in particular those associated with fluency, a question in place is whether people base their

judgments on the experience of ease while performing the task (experience-based cues), or on knowledge about cognitive processes, which is general rather than specific to the current experience with the item at hand (theory-based cues; Koriat, 1997). For instance, the unjustified effect of font size on JOL mentioned above could stem from experience of easy learning when the fonts are large relative to an experience of difficulty when the fonts are small (Undorf & Zimdahl, 2018). The same effect on JOL could also stem from people's implicit theories of learning, saying that large presentation helps memorizing while small presentation adds a challenge to the task. Attempts were made to separate the two information sources. Kelley and Jacoby (1996) aimed to focus on experience-based cues while controlling for potential theories people might have. They presented participants anagrams (scrambled words). In the first phase of the experiment, participants studied the solution words to half the anagrams. This prior exposure led to faster solutions of those anagrams in a second phase, as the correct solution came to mind more easily. After experiencing such processing ease, participants expected these anagrams to be easier for other people to solve relative to anagrams these participants solved without prior exposure to their answers. This finding demonstrates the intricate role of experience-based cues in metacognitive judgments.

The contribution of experience-based fluency and theory-based beliefs is a source of debate about heuristic cues. Mueller, Tauber, and Dunlosky (2013) found dominance of theory-based beliefs that related word pairs (SOCK - FOOT) were easier to remember than unrelated word pairs (PARROT – GAZ) over effects of experience-based processing fluency on JOLs. Based on Undorf and Erdfelder's (2015) counter-evidence that experience-based fluency is nevertheless an important basis for JOLs, both teams later concluded that theory-based beliefs contribute to JOLs in addition to experience-based fluency (Mueller & Dunlosky, 2017; Undorf & Zimdahl, 2018).

In sum, metacognitive judgments are prone to predictable biases due to utilizing heuristic cues that are generally valid, though misleading under distinct conditions. Understanding factors that people take

into account when making metacognitive judgments is essential for any attempt to educate and improve effort and behavior regulation.

# 6.2.3 Knowing What You Know: Judgment Accuracy

Judgments and decisions are generally accompanied by a subjective feeling of confidence, aimed at assessing the probability of being correct. This metacognitive judgment serves as a guide for current and future behavior, helping people avoid repeating the same mistakes and evaluate whether the available information suffices to make a reliable decision.

Most research on confidence has focused on the relation between confidence judgments and objective performance on a criterion task, with the aim of investigating how well individuals can monitor their own knowledge. Two main aspects of judgment accuracy can be distinguished, resolution (or metacognitive sensitivity) and calibration (or metacognitive bias). Resolution refers to distinguishing between correct and incorrect answers (Fleming & Lau, 2014), whereas calibration refers to the extent to which confidence judgments tend to be overconfident (i.e., more optimistic than actual performance) versus underconfident (i.e., less optimistic).

*Resolution.* Resolution plays an important role in metacognitive control processes and people's behavior (Nelson & Narens, 1990). Imagine a student facing a multiple-choice test in which errors are penalized whereas omissions are not. The test will be solved differently depending on the assessment the student makes of their candidate answers. If an answer is judged as correct, it may be worthwhile responding and risking the penalty. In contrast, if an answer is assessed as wrong, the student might decide to withhold the response. The decision to produce or withhold an answer is determined by resolution. Perfect resolution will lead to offering all the candidate responses which are indeed correct, and withhold all incorrect responses. Conversely, poor resolution—at the same level of knowledge may lead to withholding some of the correct answers and to offering a portion of the incorrect ones, resulting in penalties and lost opportunities for points (Higham & Higham, 2018).

Several indexes of resolution can be computed to assess the accuracy of a judgment once it has been elicited. All measures require the acquisition of an independent performance criterion that quantifies the relationship between accuracy and confidence. In previous research, resolution has been measured using confidence-accuracy correlations within participants (Nelson, 1984). As an alternative, other researchers have suggested signal detection theory (SDT; Green & Swets, 1966; see Figure 6.4 below), which assesses discrimination between objective states of the world (e.g., distinguishing signal from noise, or the presence or absence of a stimulus). Applied to metacognitive judgments, resolution can be seen as the sensitivity to a signal. More precisely, the primary cognitive task (e.g., memory, decision making, etc.) is often called Type 1 task, whereas the task of the discriminating of confidence ratings between one's own correct and incorrect responses in the Type 1 task is called Type 2 task. Advancers of SDT have argued that gamma correlations can be problematic, as they can be affected by the overall tendency to use higher or lower confidence ratings (i.e., metacognitive bias; Fleming & Lau, 2014). Nevertheless, gamma correlations continue to be used in metacognition research. Above-chance confidence-accuracy correlations were found in a variety of tasks, ranging from perceptual decision making to challenging problem solving, indicating that people are skilled at identifying whether their responses are correct or wrong (see Ackerman & Zalmanov, 2012; Koriat, 2018 and references therein).

*Calibration.* Another key monitoring accuracy measure in metacognition and self-regulation is calibration. A simple measure of calibration is the difference between mean confidence in success with each item and actual success rate. Several studies have indicated that people tend to be overconfident across a variety of conditions (Dunning, Heath, & Suls, 2004). In particular, Kruger and Dunning (1999) documented a metacognitive bias through which relatively unskilled individuals not only make erroneous responses but also overestimate their abilities. That is, a deficit in knowledge prevents poor performers from realizing how poorly they are performing. However, if trained to become more competent, their self-assessment also becomes more accurate.

Calibration and resolution are independent measures. An individual may have high overall confidence, but poor resolution and vice versa (Fleming & Lau, 2014). Nevertheless, recent research has shown that the two are not independent when the probabilistic structure of the environment is considered (Koriat, 2018). Across a series of experiments using two-alternative forced choice items from different domains (e.g., perceptual decision making, general knowledge, memory, and predictions about others' judgments, beliefs, and attitudes), Koriat (2018) found that resolution is strictly dependent on the accuracy of Type 1 task performance and that positive correlations between confidence and accuracy observed across many studies are confined to items for which accuracy is better than chance. Furthermore, calibration depended on task difficulty: items with accuracy smaller than 50% led to a strong overconfidence bias, whereas items for which accuracy was better than chance were associated with almost perfect calibration. These results support the proposition that for difficult items that are likely to elicit erroneous responses, individuals are largely unaware of making a mistake. Consistent with this account, the overconfidence bias decreases markedly when the selective reliance on difficult items is avoided through representative sampling (Gigerenzer, Hoffrage, & Kleinbölting, 1991).

Another key element of metacognitive judgments is the time of elicitation. Judgments can be prospective (i.e., occurring before performing a task), or retrospective (i.e., occurring after task completion). For example, a student may reflect on their current knowledge to predict their success on an upcoming test (prospective judgment) and, judge afterwards how well they did, trying to estimate their grade (retrospective judgment). Few behavioral studies have pitted prospective against retrospective judgments for the same task. Siedlecka, Paulewicz and Wierzchon (2016) compared prospective and retrospective ´ confidence judgments. Participants rated whether presented words were the solution to anagram tasks. Participants also rated their certainty, either before or after seeing the suggested solution. The authors found that post-decision confidence ratings were more accurate than ratings made prospectively. Resolution and calibration were also found to be higher

in retrospective than in prospective judgments by Fleming, Massoni, Gajdos, and Vergnaud (2016), using a perceptual decision task. Retrospective confidence ratings were provided on every trial, whereas prospective judgments were only provided prior to every fifth trial. The authors found dissociable influences on prospective and retrospective judgments. Whereas retrospective judgments were strongly influenced by current-trial fluency, and accuracy and confidence in the immediately preceding decision, prospective judgments were influenced by previous confidence over a longer time frame. Furthermore, individual overconfidence was stable across prospective and retrospective judgments, suggesting that overconfidence represents a stable personality trait (Ais, Zylberberg, Barttfeld, & Sigman, 2016; Jackson & Kleitman, 2014).

As many reasoning and problem-solving tasks go on over an extended period of time, the assessment of performance and success probability must be updated repeatedly (Ackerman, 2014). Intermediate confidence is an internal estimate of the adequacy of possible responses considered before arriving at a final solution (see Ackerman & Thompson, 2017). To study this process, Ackerman (2014) asked participants to rate their intermediate confidence every few seconds until they provided a solution, after which they rated their final confidence. The first intermediate judgment turned out to be a good predictor of the amount of time participants spent solving the problems. Confidence tended to increase over time. However, whereas at the beginning, participants tended to provide answers when confidence was high, over time they became more willing to provide answers at a lower level of confidence. Final low-confidence responses could be as low as 20%, even when there was an option to give up, by answering "I don't know".

The study of confidence judgments has been extended in the last few decades to collective decision making. In numerous perceptual as well as cognitive decisions, interacting individuals can make more accurate decisions by discussing one's own perceptual experiences with others and integrating different opinions, achieving a reliable collective benefit even in the absence of objective feedback (Bahrami et al., 2010). That is, the accuracy achieved by sharing

and combining subjective information via social interaction can exceed the accuracy of each individual opinion, even that of the best individual in the group. This phenomenon is known as the "two-heads-betterthan-one" effect (Koriat, 2012) or "wisdom of the crowd" (Surowiecki, 2004). Koriat (2012) presented participants with two-alternative forced inference tasks and showed that members of a dyad can take advantage of the wisdom of the group by using a simple heuristic: choosing the response expressed with the highest level of confidence. These findings have relevant implications for collective and democratic decisions and actions.

## 6.2.4 Neuroscience of Metacognition

In recent years, the study of metacognition was enriched by growing evidence from neuroscience concerning the underlying neurocognitive architecture. Specific neural substrates (especially in frontolateral, frontomedial, and parietal regions; see Figure 6.3) are involved in metacognition (e.g., Fleming, Huijgen, & Dolan, 2012; Fleming, Ryu, Golfinos, & Blackmon, 2014; Fleming, Weil, Nagy, Dolan, & Rees, 2010). However, the neural bases of human metacognition remain controversial. Metacognition operates on a variety of first-order processes, ranging from memory to perception, problem solving, etc. The diversity of the tasks to be monitored and controlled complicates the study of its neural signature, as it can be difficult to differentiate between the neural activations attributable to the metacognitive monitoring and control processes and the neural signature of the first-order cognitive/emotional processes (Metcalfe & Schwartz, 2016).

Existing attempts to isolate the metacognitive monitoring and control processes from first-order processes, testify to the uniqueness of metacognitive processes. Initial evidence was obtained from neuropsychological cases. For instance, Shimamura and Squire (1986) suggested that frontal lobe (behind forehead) impairments in patients with Korsakoff's syndrome—a chronic memory disorder characterized by severe anterograde amnesia—can impact metacognitive judgments independently of cognitive performance per se. A common finding suggests that neural signals involved in error monitoring originate in the posterior medial frontal cortex (pMFC; Dehaene, Posner, & Tucker, 1994).

Since the introduction of these seminal studies, further research into such domains as memory, perception, and decision making, has identified neural correlates of metacognitive judgments and further dissociated cognitive from metacognitive processes. Fleming et al. (2010) had participants performing a perceptual decision-making task and providing ratings of confidence after each decision. The authors found considerable variation between participants in metacognitive accuracy. Using MRI, this variation in confidence accuracy was found to be correlated with grey matter volume in the right rostrolateral areas of the prefrontal cortex (PFC). Furthermore, greater accuracy in metacognitive judgments was associated with increased white-matter microstructure connected with this area of the PFC. These results point to neural bases of metacognition that differ from those supporting primary perception. Similarly, in a study by Do Lam and colleagues (2012), participants who had first learned the pairwise associations between faces and names were then presented again with each face and asked to provide judgments of learning (JOLs) regarding the chance of recalling the associated name. A neurological dissociation was found between the processes of memory retrieval, which were located in the hippocampal region (i.e., medial temporal lobes), and those underlying JOLs, which were located in the medial PFC, orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC).

Anatomical, functional, and neuropsychological studies have confirmed the consistent involvement of a frontoparietal network in metacognition (Vaccaro & Fleming, 2018). Activations were located in the posterior medial PFC, ventromedial PFC and bilateral anterior PFC/ dorsolateral PFC. Other researchers observed activations in the bilateral insula and dorsal precuneus (Vaccaro & Fleming, 2018). These results suggest that the parietal cortex, particularly precuneus, and insula represent key nodes supporting metacognition, together with the PFC.

Existing research supports the existence of neural dissociations between prospective and retrospective metacognitive judgments (Chua, Schacter, & Sperling, 2009; Fleming & Dolan, 2012). For example, in a study on patients with lateral frontal lesions,

Pannu, Kaszniak, and Rapcsak (2005) found impaired retrospective confidence judgments, but preserved judgments of future task performance. Conversely, Schnyer and colleagues (2004) found an association between damage to the right ventromedial PFC and a decrease in accuracy for metacognitive judgments about future recall (feeling of knowing), but not for accuracy of retrospective confidence judgments. Further evidence comes from functional MRI studies, which have shown that prospective metacog-

nition activates medial aspects of the PFC, while retrospective metacognitive accuracy is correlated with lateral PFC activity (Fleming & Dolan, 2012). When separating metamemory judgments by temporal focus in their meta-analysis, Vaccaro and Fleming (2018) found that retrospective judgments were associated with activity in the bilateral parahippocampal cortex and left inferior frontal gyrus, whereas prospective judgments activated the posterior medial PFC, left dorsolateral PFC, and right insula.

Figure 6.3: Gross neuroanatomy. a) Relative position and direction of brain structures. b) The four brain lobes from a lateral view. c) and d) Approximate locations of the broadest subdivisions of the PFC and other areas linked to metacognition. Illustrations adapted from Patrick J. Lynch, medical illustrator, C. Carl Jaffe, MD, cardiologist, under the Creative Commons Attribution 2.5 License, 2006 (CC-BY-2.5). Retrieved from https://commons.wikimedia.org/wiki/File:Brain\_human\_lateral\_view.svg and https://commons.wikimedia.org/wiki/File:Brain\_human\_sagittal\_section.svg. *Abbreviations*: dmPFC, dorsomedial prefrontal cortex; vmPFC, ventromedial prefrontal cortex; ACC, anterior cingulate cortex; dlPFC, dorsolateral prefrontal cortex; rlPFC, rostrolateral prefrontal; vlPFC, ventrolateral prefrontal cortex; OFC, orbitofrontal cortex.

Nevertheless, neuroimaging evidence directly comparing between different judgement types is scarce. In one of the few studies directly comparing neural activation related to prospective feeling of knowing and retrospective confidence judgment, Chua and colleagues (2009) found an association between prospective judgements and activation in medial parietal and medial temporal lobe, whereas retrospective judgements were associated with inferior prefrontal activity. However, common activations associated with both prospective and retrospective judgments were also observed in regions of medial and lateral PFC, and mid-posterior areas of cingulate cortex. These results suggest that neural activations related to different judgment type may differ in degree rather than in kind (Vaccaro & Fleming, 2018).

Another relevant question tackled in neuroscience is whether metacognition relies on a common, domain-general resource or on domain-specific components that are particular to the respective first-order tasks. Recent neuroimaging studies yielded pertinent evidence for both domain-general and domain-specific neural markers (see Rouault, McWilliams, Allen, & Fleming, 2018, for a review). A frontoparietal network contributes to metacognitive judgments across a range of different domains. Still, neuroimaging evidence for direct comparisons is scarce. In a recent meta-analysis, Vaccaro and Fleming (2018) observed common regions in separate investigations of memory and decision-making tasks, which included: insula, lateral PFC, and posterior medial PFC. As suggested by Morales et al. (2018), this result may indicate that judgments in both memory and decision making are driven by common inputs. The meta-analysis also pointed to further regions that are activated by specific tasks. More precisely, meta-memory engaged left dorsolateral PFC and clusters in bilateral parahippocampal cortex, whereas right anterior dorsolateral PFC was involved in decision making (Vaccaro & Fleming, 2018).

In summary, the neural underpinnings of even the most straightforward metacognitive judgments are complicated. Although metacognition can be dissociated from task performance, most studies have revealed activations in multiple brain areas, and differences have emerged between prospective

and retrospective judgments. Convergent evidence indicates that the function of the rostral and dorsal areas of the lateral PFC is important for the accuracy of retrospective judgments of performance. In contrast, prospective judgments of performance seem to depend on medial PFC. Recent studies have resulted in a rather nuanced picture, suggesting the co-existence in the brain of both domain-specific and domain-general signals.

# 6.3 Metacognitive Perspectives on Applied Rationality

The research reviewed so far has proven to be fruitful and thought-provoking, suggesting metacognitive explanations of adaptive behavior. We have seen that metacognitive deficits can lead to irrationality and inefficiency. In particular, we have reviewed memorable evidence on the illusion of knowledge, which consists of the gross overestimation of one's chance of success, typically brought about by deceptive feelings of fluency or flow (Fiedler, 2013). Overconfidence, in particular, can be a major source of bias and a dangerous obstacle in decision making under risk and under uncertainty (Glaser & Weber, 2007; Kruger & Dunning, 1999).

The metacognitive perspective is of particular importance for applied research on rational thinking, adaptive regulation, medical diagnosis and treatment, democratic decision making, lie detection, debunking of fake news, argumentation, trust, (im)moral action, and procedural justice in courtrooms, selection committees, or executive decisions. Checking and optimizing the quality of higher-order cognitive operations—the very domain of metacognition—is crucial for rational and responsible behavior. We illustrate this point in the remainder of this section.

# 6.3.1 Legal Judgments and Decisions

A classical domain of metacognitive research in legal psychology is eyewitness identification performance. Because everybody expects eyewitnesses to identify the perpetrator in a lineup and because the persons in the presented lineup are much more

vivid than the original persons in a past episode, the enhanced readiness to make a positive recognition decision produces many correct identifications (when the identified suspect is indeed the perpetrator) but also many incorrect identifications (when the identified suspect is not the perpetrator).

As illustrated in Figure 6.4, a liberal identification criterion (rather left position of C) produces, say, 90% correct identifications but roughly 40% incorrect identifications. A high false-alarm rate can be conceived as a case of overconfidence; C is apparently too weak a criterion to discriminate guilty and innocent persons, yielding an intolerably high rate of wrong convictions. Consistent with this account, a majority of exoneration cases after the introduction of DNA proofs turned out to be innocent victims of incorrect eyewitness identification.

The distinction between prospective and retrospective confidence judgments is also relevant to eyewitness testimony (Nguyen, Abed, & Pezdek, 2018). Witnesses are often asked to rate shortly after witnessing a crime their ability to recognize the perpetrator in the future (prospective confidence). Subsequently, when asked to identify someone from a lineup, eyewitnesses are asked how confident they are that they identified the correct person as the perpetrator (retrospective confidence). Nguyen, Abed, and Pezdek (2018) found that postdictive confidence was a better indicator of identification accuracy than predictive confidence, both for faces of the same race as the witness and for cross-race faces. Consistent with the lab findings reviewed above, this suggests that eyewitness confidence should be collected at the time of identification rather than earlier on the crime scene.

# 6.3.2 Metacognitive Myopia as a Major Impediment of Rationality

Fewer optimistic insights were obtained in other areas of metacognition research. Rational judgments and decisions about economic, political, legal, and health-related issues rely heavily on the critical assessment of both the logical correctness of mental operations and the validity of the underlying evidence. A conspicuous deficit in this sorely needed function of critical assessment has been termed

metacognitive myopia (Fiedler, 2000, 2008, 2012). As the term "myopia" (short-sightedness) suggests, experimentally demonstrated violations of rational norms typically do not reflect insufficient attention or insensitivity to the stimulus data. On the contrary, people are quite sensitive to the data given; they are in a way too sensitive, taking the data for granted and failing to discriminate between valid and invalid information. For example, when judging the success of different stocks on the stock-market, participants were quite sensitive to the frequency with which various stocks were reported in TV programs among the daily winners. However, they failed to take into account that the daily winning outcomes of some stocks had been reported in more than one TV program (Unkelbach, Fiedler, & Freytag, 2007). Although they fully understand that two TV programs on the same day provide the same stock-market news, participants do not exhibit much success in taking the redundancy into account. Even when they are explicitly reminded of the redundancy and instructed not to be misled by such repetitions, they cannot avoid their misleading influence. This failure to overcome a known pitfall is a metacognitive flaw.

Analogous findings were observed across many experimental tasks. Fully irrelevant numerical anchors influence quantitative judgments (Wilson, Houston, Etling, & Brekke, 1996). Samples that dramatically over-represent the base-rate of rare events (e.g., samples in which the prevalence of HIV is 50% rather than 0.1% as in reality) are used to estimate associated risks (Fiedler, Hütter, Schott, & Kutzner, 2018). Correctly denied questions referring to objects or behaviors not included in a film nevertheless increased the probability that the non-existing objects were later recalled erroneously (Fiedler, Armbruster, Nickel, Walther, & Asbeck, 1996). In a perseverance paradigm, explicit debriefing about an experimental lie did not erase the implications and psychological consequences of the lie (Ross, Lepper, & Hubbard, 1975). Common to all these findings is that participants, who fully understand that invalid stimulus information should be discarded, are nevertheless influenced by that invalid information.

The conspicuous naivety with which information is used and retained uncritically, regardless of its

Figure 6.4: Signal detection analysis of eyewitness-identification performance: the solid (dashed) curve represents the distribution of memory strength when the suspect in a lineup is (is not) the real perpetrator. Discriminability is the average horizontal difference d' between curves. An identification decision is made when memory strength exceeds the criterion C. The areas right of C under the solid (dashed) curve are the probabilities of correct (incorrect) identification.

invalidity, is reminiscent of Hannah Arendt's (1963) admonition that compliance and uncritical conformity are the origin of severe harm and violations of legal norms of humanity. But although the superego residing in the metacognition's pre-frontal brain area is ethically obliged to engage in critical test and reconfirmation, its role in higher-order cognition is often impoverished. Meta-analyses of modern research on debunking (Chan, Jones, Hall Jamieson, & Albarracín, 2017), for instance, testify to the inability of scientific or political debriefing to erase fake news or obvious myths. Thus, even when the public are fully debriefed that Iraq did not possess any atomic bombs when the US invaded, that the evidence on global warming is uncontestable, or that polygraph lie detection is not supported by reliable studies, people change their erroneous beliefs only slightly and continue to hold the discredited wrong beliefs to a considerable extent.

When it comes to integrating different individual opinions in group decision making or advice taking, a typical uncritical strategy is equal weighting of opinions, in spite of better knowledge or even explicit feedback about clearly unequal competence of different advice givers (Fiedler et al., 2018; Mahmoodi et al., 2015). Recent research by Powell, Yu, DeWolf, and Holyoak (2017) showed that the

attractiveness of products offered by Amazon may depend on quantity (number of available reviews) more than on quality (mean rating provided by previous customers). Confusion of quantity and quality was also observed by Fiedler, Kareev, Avrahami, Beier, Kutzner, and Hütter (2016), who found that increases (decreases) between samples of two symbols in the proportion of one critical symbol were readily detected only when absolute sample size increased (decreased) well.

In causal reasoning, metacognitive myopia is evident in a tendency to exclusively focus on effect strength and to disregard the strength of the causal input that was necessary to induce the observed effect strength. For example, the impact of a drug on athletic performance is judged to be higher if the same dose of the drug causes a performance increase of 10 scale points rather than 1 point. However, whether 254 mg or only 34 mg of the drug were necessary to induce the same observed performance change is given little weight (Hansen, Rim, & Fiedler, 2013).

Why do irrational consequences of metacognitive myopia extend from objectively difficult to such trivially easy task settings? Why do people continue to be influenced by invalid information which is obviously wrong (like an irrelevant numerical anchor) and which they explicitly classify as invalid?

A tentative answer might lie in a kind of metacognitive learned-helplessness effect (Maier & Seligman, 1976). Homo sapiens may have learned that many real-life tasks do not provide us with sufficient information for a normatively sound monitoring and control process. Thus, a Bayesian algorithm required to correct for biases in an information sample is often unknown or does not exist at all. This experience may then be over-generalized to easy situations in which monitoring and control would be simple and

straightforward. In any case, metacognitive myopia seems to constitute a major impediment in the way of rational behavior.

# Acknowledgment

The work underlying the present article was supported by a grant provided by the Deutsche Forschungsgemeinschaft to the first author (FI 294/26-1).

#### Summary


#### Review Questions


### Hot Topic

#### More on eyewitness memory

Signal-detection analysis has been extremely helpful in clarifying the metacognitive origin of the serious errors in eyewitness identifications. Even though witness' memory cannot be influenced in retrospect—that is, the discriminability d' of correct and incorrect memories is typically invariant—it has been shown that the rate of false identifications can be markedly reduced by simply inducing a more conservative response strategy, that is, a higher criterion C. A glance at Figure 6.4 will easily confirm that a rightward shift of C (up to the intersection point of both curves) will reduce the number of incorrect identifications (area right of C under the dashed curve) more than the number of correct identifications (are under the solid curve), thus increasing the overall rate of correct decisions. Such clever metacognition research has

Klaus Fiedler

led to a commonly noted improvement of legal practices (Wells et al., 2000).

Now after two or three decades of improved lineup procedures, a recent state-of-the art review by Wixted and Wells (2017) has arrived at the optimistic conclusion that ". . . our understanding of how to properly conduct a lineup has evolved considerably". Under pristine testing conditions (e.g., fair lineups uncontaminated with administrator influence; immediate confidence statement), eyewitness ". . . (a) confidence and accuracy are strongly related and (b) high-confidence suspect identifications are remarkably accurate." [p. 10].

#### Computerized learning environments

Computerized environments are replacing paper-based environments for training, learning, and assessment. However, a puzzling finding is screen inferiority—a disadvantage in learning from computer screens even when the task draws on capabilities considered well-suited for modern technologies like computers or e-books (see Gu, Wu, & Xu, 2015, for a review).

A recently arising metacognitive explanation proposes that computerized environments provide a contextual cue that induces shallower processing than paper environments (e.g., Daniel & Woody, 2013; Morineau, Blanche, Tobin, & Guéguen, 2005). Metacognitive research on reading comprehension has found that JOL reliability is poor (see Dunlosky & Lipko, 2007, for a review).

Rakefet Ackerman Notably, studies provide growing evidence that associates computerized learning with inferior metacognitive processes, particularly with consistent overconfidence and less effective effort regulation (for a review, see Sidi et al., 2017).

Chiara Scarampi

Self-regulated learning needs guidance. Metacognitive scaffolding can support preparatory phases of orientation and planning, monitoring of progress while learning, and retroactive activities such as reflection (Roll, Holmes, Day, & Bonn, 2012). Given that learning and JOL reliability can be improved through self-questioning, appropriate test expectancy, analyzing the task, and delayed summaries (Wiley, Thiede, & Griffin, 2016), it is interesting that screen inferiority could be ameliorated by guiding participants to increase mental effort expenditure. This was achieved by asking participants to proofread, edit, and write keywords summarizing texts' contents (Eden & Eshet-Alkalai, 2013; Lauterman & Ackerman, 2014). Apparently, then, in-depth text processing is the default on paper, whereas on screen an external trigger is required to enhance metacognitive processes leading to enhanced performance (Sidi et al., 2017).

#### References


# References


test. *Metacognition and Learning*, *9*(1), 25–49. doi:10.1007/s11409-013-9110-y


dictions. *Psychological Bulletin*, *95*(1), 109–133. doi:10.1037/0033-2909.95.1.109


ments: Evidence from patients with lesions to frontal cortex. *Neuropsychologia*, *42*(7), 957–966. doi:10.1016/j.neuropsychologia.2003.11.020


actual solvability, and length on intuitive problem assessments of anagrams. *Cognition*, *146*, 439–452.doi: doi:10.1016/j.cognition.2015.10.019


fects of stimulus size on judgments of learning. *Journal of Memory and Language*, *92*, 293–304. doi:10.1016/j.jml.2016.07.003


# Glossary


# Chapter 7

# Deductive Reasoning

JONATHAN ST. B. T. EVANS

University of Plymouth

A deduction is a conclusion that follows from things we believe or assume. Frequently, we combine some fact or observation with a rule or rules that we already believe. For example, you meet Sue, who tells you that Mary is her mother. You immediately infer that Sue is Mary's daughter, although that is not the fact that was presented to you. This means that you must carry around rules such as

If x is the mother of y and y is female, then y is the daughter of x

Of course, I am not saying that you are conscious of this rule or of applying it to make this deduction but it must be there in some form in your brain, together with some mechanism for applying it to facts and observations. In this case, the inference occurs rapidly and effortlessly but this is not always the case with deductive reasoning. Take the case of claiming allowances when completing an income tax return. In this case there may be many rules and their wording may be complex and opaque to those who are not expert in tax law. If your financial affairs are complex, even establishing the relevant facts may be headache. This is why people often pay expert tax advisers to do the reasoning for them.

Deduction has a clear and obvious benefit for human beings. Our memories are limited and our brains can only store so many beliefs about the world. However, if we also hold a number of general rules, then these can be applied to draw out implications as and when they are required. Deduction

is also involved in hypothetical thinking, when we ask 'What if?' questions. An example is science in which theories must be tested against empirical evidence. Scientific theories take the form of rules, often formalised with mathematics. When experimental studies are run, we set up some conditions and then predict the outcome. The prediction is a deduction, which is used to test the theory. For example, climate change scientists have been predicting for the past twenty years or more that warming temperatures would disrupt the jet stream and lead to more extreme weather events. These predictions were calculated from their mathematical models, which is a form of deductive reasoning. Both abnormal jet stream flows and extreme weather events have been observed in recent years with increasing frequency, lending credibility to these models.

For deduction to be useful, it needs to be accurate. This has been recognised in the discipline of philosophy for centuries. Philosophers devised systems of *logic*, whose purpose is to ensure accurate deduction. A logically valid argument is one whose conclusion necessarily follows from its premises. Put simply, this means that in a logical argument the conclusion must be true if the premises are true. If the motherdaughter rule given earlier is true (which it is by convention) and your observation that Sue is female is also correct, then she *must* be Mary's daughter.

Logic provides rules for reasoning. Here are a couple of examples

Modus Ponens Given if x then y, and the assumption of x, y is a valid conclusion

Disjunction elimination Given x or y and not-x, y is a valid conclusion

Modus Ponens is very useful, because it means we can state hypothetical beliefs, which only apply when some condition is met. Some of the conditional sentences we use in everyday life are necessarily true, for example, 'if a number is even and greater than two, it cannot be prime', but most are not. For example, we may advise someone 'if you catch the 8.00 am train then you will get to work on time'. If this is generally true, then it is good advice, but of course the train might break down. The real world rarely allows inferences to be certain, but we nevertheless use conditional statements a great deal because of the natural power of Modus Ponens. A disjunctive statement is an either-or. For example, someone might say 'I will either catch the 8.00 am train or take the bus at 8.10'. If you later learn that they did not catch the train you can deduce that they took the bus instead. Once again, in the real world, the inference will not be certain. The individual may have called in sick and not gone to work at all. But our deduction is valid, given the assumptions on which is based.

There is a tradition in philosophy that logic is the basis for rational thought (Henle, 1962; Wason & Johnson-Laird, 1972). This view held a powerful influence on psychology during the second half of the twentieth century and was responsible for a major method of studying human reasoning, which I will call the deduction paradigm (Evans, 2002). Huge numbers of experiments were run with this paradigm and I will try to summarise their main findings in this chapter. While many important things were learnt about the nature of human thinking and reasoning, a lot of psychologists eventually lost faith in the importance of logic for rational thinking. In recent years, this has a led many to revise their methods and adopt what is called the new paradigm psychology of reasoning. I will explain the new paradigm and some of the findings it has led to at the end of this chapter. For now, I will focus on the deduction paradigm and the theories and findings that are associated with it.

The deduction paradigm tests whether people *untrained in logic* can make valid inferences. The idea behind this is that if logic is the basis for rationality in everyday thinking then everyone should comply with it, not just those who have taken logic classes. So the first condition in this method is to exclude participants with formal training. The next is to present them with some premises, or assumptions, from which a logical deduction can be made. Often a conclusion is also given and people are asked whether it follows or not. Two other instructions are usually given: (1) assume that the premises given are true and (2) only make or endorse a conclusion which necessarily follows from them. Given these instructions, only the form of the argument should matter, not the content. For example, people should always agree that Modus Ponens and disjunction elimination are valid arguments, no matter what we substitute for x and y in the rules given above. By this means, the paradigm assesses whether or not people are logical in their deductive reasoning.

# 7.1 The Deduction Paradigm: The Main Methods and Findings

A small number of experiments on deductive reasoning were published early in the twentieth century (Wilkins, 1928; Woodworth & Sells, 1935), which immediately demonstrated what was to come from the intensive study that occurred from the 1960s onwards. That is to say, people were observed to make frequent logical errors, to show systematic biases, and to be influenced by their beliefs about the content of the premises and conclusions. All of these findings have been replicated many times since using three major methods, or sub-paradigms. These are syllogistic reasoning, conditional inference, and the Wason selection task. In this section I will discuss each in turn, explaining the methods and typical findings.

#### 7.1.1 Syllogistic Reasoning

A syllogism involves two premises and three terms, which I will call A, B and C. This is the most ancient system of logic, devised by Aristotle. You may

have come across the famous syllogism 'All men are mortal, Socrates is a man, therefore Socrates is mortal.' Classical syllogisms have statements in four *moods*, shown in Table 7.1 (a). These statements can be used for either the first premise, the second premise, or the conclusion in any combination. Let us consider them in turn.

Figure 7.1 shows diagrammatically several different models for the relation between two categories, A and B. When we examine the different statements in Table 7.1 (a) we see that most of them are ambiguous and can be represented by at least two different models. For example, All A are B would be true for a model of identity – All men have Y chromosomes – or where B includes A – All boys are male. No A are B is unambiguous; it can only refer to a model of exclusion. Some A are B is highly ambiguous – it is true in all models except exclusion. Finally, Some A are not B is true for exclusion but also for a model in which A includes B – Some males are not boys. This gives us a clue to the complexity of syllogistic reasoning, as we have to take account of all possible ways that the categories could be related. Moreover, when we combine two premises, we have to consider the ways in which all three categories A,

Table 7.1: The structure of classical syllogisms.

#### (a) Mood of premises



#### (b) Figure of syllogism


*Note*: The letters A, E, I and O are classically used as abbreviations for the four moods of the premises.

B, and C could be related. For the argument to be valid, its conclusion has to be true in all models of this three-way relationship that the premises allow.

A fallacy is an argument whose conclusion need not be true, given the premises. A basic finding with syllogistic reasoning is that participants endorse many fallacies. This has been reported by many authors and confirmed in the one study (to my knowledge) that presented every possible combination of premises and conclusions for evaluation (Evans, Handley, Harper, & Johnson-Laird, 1999). But these mistakes are not random – there are systematic biases in syllogistic reasoning. Consider the following syllogism:

All A are B

#### All B are C

#### Therefore, All C are A

This is a fallacy: the conclusion does not necessarily follow. Yet in the study of Evans et al. (1999), 77% of participants (university students) said that the conclusion necessarily followed from

the premises. This is really odd when you consider the set relationships involved (see Figure 7.2). Surely, the most likely situation that describes the premises is the model to the left, showing that A is a subset of B and B a subset of C. For example

All Alsatians are dogs; all dogs are mammals

But in that the case, the conclusions endorsed would be 'All mammals are Alsatians,' which is obviously false. The conclusion All C are A would only be true in the second model, where A, B, and C are all identical, a most unusual state of affairs. This is a finding which you only get with abstract materials, where letters are used to represent categories. But why does it occur? It is consistent with a very old claim called the *atmosphere effect*: participants are inclined to accept conclusions whose mood matches that of the premises. In the same study, only 47% of participants said the following syllogism was valid

All A are B

All B are C

Therefore, Some C are A

This is stranger still because Some C are A has to be true whenever All C are A. Not only that, but

Evans Deductive Reasoning

the conclusion is actually *valid* in this case. You can verify that by examining the models of the premises shown in Figure 7.2. Of course, the mood of the conclusion does not match that of the premises here, so it does conform with the atmosphere effect. In fact, atmosphere is consistent with many but not all responses observed in syllogistic reasoning tasks (Evans, Newstead, & Byrne, 1993). Another known biasing factor is the figure of the syllogism, which is the order in which terms are arranged (Table 7.1b). This also affects people's perception of validity.

There are a number of high-profile theories of syllogistic reasoning based on different principles and giving broadly accurate explanations of the data (for reviews, see Evans et al., 1993; Manktelow, 1999). When realistic content is introduced, however, other factors come into play, especially belief bias. Consider the following syllogism:

No addictive things are inexpensive

Some cigarettes are inexpensive

Therefore, some addictive things are not cigarettes 71%

A major study which established the influence of beliefs was that of Evans, Barston, and Pollard (1983). Over three experiments, they found that

Figure 7.1: Models of relations between two categories, A and B.

Figure 7.2: Models of relations between three categories, A and B and C consistent with the premises All A are B, All B are C.

71% of participants agreed that this syllogism was valid, that is, that the conclusion must be true if the premises are true. Now consider this syllogism presented in the same study:

No millionaires are hard workers

Some rich people are hard workers

Therefore, some millionaires are not rich people 10%

In this case, only 10% thought the syllogism was valid. But if you look carefully you can see that both syllogisms have the same logical form. The syllogism is actually *invalid*. The key difference between these two is that the first realistic version has a conclusion which is believable (we know that there are other addictive drugs) but the second has a conclusion which is unbelievable – millionaires are rich by definition. When valid arguments were used, people also more often thought they were valid if they believed the conclusion (89%) than if they did not (56%), so there is a belief bias for valid arguments as well, although not as a strong. These findings have been replicated many times since (for a review see Klauer, Musch, & Naumer, 2000).

#### 7.1.2 Conditional Inference

Conditional statements, also known just as conditionals, have the form, if p then q. We use many such statements in real life for all kinds of purposes. Here are some examples:

*Causal*: If you heat water sufficiently, it will boil.

*Prediction*: If you vote Republican, you will get your taxes cut.

*Tip*: If you study hard, you will pass the examination.

*Warning*: If you miss the 8.00 am train, you will be late for work.

*Promise*: If you wash my car, I will give you ten dollars.

*Threat*: If you stay out late again, you will be grounded.

*Counterfactual*: If you had putted well, you would have won the match.

Philosophers have long studied conditional statements, considering them of particular importance for human reasoning, writing many books on the subject (an excellent review of philosophical work


Table 7.2: The four main conditional inferences.

is given by Edgington, 1995). A great deal of work in the psychology of reasoning has also focussed on conditional statements (for a recent review see Nickerson, 2015). I wrote an entire book on 'if' myself, collaborating with a philosopher to cover the perspectives of both traditions—philosophy and psychology (Evans & Over, 2004). The reason they are so important is that they are central to a unique human facility which I call *hypothetical thinking* (Evans, 2007). That is the ability to imagine how things might be in the future or how they might have been different in the past.

Standard logic provides an account of how we should reason with conditionals. Some of this standard account is disputed by both philosophers and psychologists, but all are agreed about the four inferences shown in Table 7.2. We have already encountered Modus Ponens (MP) as an example of a valid deductive argument. Imagine a situation where cards have a letter written on one side and a number on the other. Then we can express a conditional hypothesis such as

#### If the letter is B, then the number is 3

For conditional inference, we need to assume that the conditional is true. This is the *major premise* of the deductive argument. The *minor premise* is an assertion that either the first or second part of the conditional is true or false, leading to the four

arguments illustrated in Table 7.2. For example, if we suppose that the first part is true – the letter is B – then it follows by MP (Modus Ponens) that the number is 3. The other valid argument that can be made is Modus Tollens (MT). Suppose that the number on the card is not a 3. Then it follows logically that the letter is not a B. Why? Because the conditional is true, so if there had been a B on the card, then there would also have to have been a 3. This argument is equally valid, if less immediately obvious. What all psychological studies of such abstract inferences show is that the MP inference is made nearly 100% of the time, while the MT inference is only endorsed about 60% of the time in the same experiments (see Figure 7.3).

The other two arguments shown in Table 7.2 are fallacies. That is, the conclusions given do not *necessarily* follow. If we assume that the number is 3, it does not necessarily follow that the letter is a B (Affirmation of the Consequent, AC), because the conditional does not say that only B cards can have 3s. Similarly, if we know that the letter is not B, we cannot say that the number is not a 3 (Denial of the Antecedent, DA). And yet we see that university students endorse both of these fallacies about 40% of the time (see Figure 7.2). A likely reason for this is that people are making inferences that are *pragmatically* rather than logically implied. For example, the Denial of the Antecedent fallacy

Figure 7.3: Percentage frequencies of endorsement of the four conditional inferences with abstract materials (weighted average of 11 studies reported by Evans et al. 1983, Table 2.4, total N = 457). Key: MP – Modus Ponens, DA – Denial of the antecedent, AC – Affirmation of the consequent, MT – Modus Tollens.

is endorsed much more often for conditional statements that express causal relationships or are used to make threats or promises (Newstead, Ellis, Evans, & Dennis, 1997). Consider the promise: if you wash my car, I will give you ten dollars. Most people will say that if you suppose the car is not washed then the ten dollars will not be paid (DA). Although not logically implied this makes every sense in terms of the pragmatics of everyday conversation. The speaker wants the car washed and is providing an incentive: it would make no sense to pay someone who did not wash the car. A tip is weaker than a promise pragmatically because the speaker suggests an action will produce a desired outcome but has no actual control over it. An example might be 'if you wash Dad's car, he will give you ten dollars.' The frequency of endorsing even Modus Ponens drops significantly when a tip is substituted for a promise as do *all the other* conditional inferences.

There are many experiments published on how beliefs influence conditional inferences, far too many to discuss here (for reviews, see Evans et al., 1993; Evans & Over, 2004; Nickerson, 2015). All of them show that belief affects conditional reasoning in very significant ways when logically equivalent

inferences are presented with different problem content.

# 7.1.3 The Wason Selection Task

Peter Wason was a British psychologist who is regarded as the father of the modern psychology of reasoning. Most of his influential work was published between about 1960 and 1980, including a book which helped to identify the psychology of reasoning as a research field (Wason & Johnson-Laird, 1972). Of lasting importance has been the invention of several novel tasks for studying reasoning, the most influential of which has been the four-card selection task. Strictly speaking, the selection task does not meet all the definitions of the deduction paradigm as I have given them, as it involves hypothesis testing as well as reasoning. However, the task is focussed on the logic of conditional statements and has been extensively studied by the same research community that has studied conditional inferences and other more conventional reasoning tasks. It has also been used to address broadly the same set of theoretical issues

A typical standard abstract form of the problem is presented in Figure 7.4. The generally accepted cor-

Figure 7.4: The standard abstract Wason selection task.

rect answer is to choose the A and 7 cards, although few participants make these selections. Most people choose either the A card alone, or the A and the 3. Wason pointed out that the conditional statement can only be falsified if there is a card which has an A on one side and does not have a 3 on the other. Clearly, the A card must be turned over because it must have a 3 on the back, and if it does not, disproves the claim. Similarly, the 7 card – which is not a 3 – could have an A on the back, which would also disprove the statement. Turning the 3 is unnecessary, as it cannot disprove the rule. It need not have an A on other side.

Why do people choose A and often 3 and ignore the 7? Wason originally suggested that they had a verification or confirmation bias. They were trying to prove the rule true rather than false, and hence looking to find a confirming combination of A and 3. In support of this account, if you ask people to give written justifications for their choices, they typically say that they were looking for 3 when turning the A and vice versa because this would make the rule true (Wason & Evans, 1975). However, in an early research paper of my own, I showed that this account cannot be right. The trick is to include a negative in the second part of the conditional as in the example shown in Figure 7.5. When the rule says, as in the

example, that if there is G on one side of the card, then there cannot be a 4 on the other side of the card, most people choose the G and 4 cards. But these choices do not verify the rule, they correctly falsify it. The combination that falsifies this statement is, of course, G and 4. Once again, participants say that they are turning the G to find a 4 and vice versa (Wason & Evans, 1975) but now in order to make it *false*. It is as though the negative has helped them to understand the logic or the problem.

The effect here is called *matching bias*. People tend to choose the cards which match those named in the conditional, whether or not negations are present. But these negations affect the logic of the task, so it is a puzzling finding. Matching bias is another example of a cognitive bias, like the atmosphere effect, which operates with abstract materials. With the other methods, I showed that the introduction of realistic materials makes a big difference to responding. The same is true of the selection task. It was thought initially that simply using realistic materials made the problem a lot easier with higher rates of correct selections (Wason & Johnson-Laird, 1972). This was known as the *thematic facilitation effect*. However, it was later shown that the versions that make the task really easy include a subtle change to the logic. An example, known as the 'drinking age

Figure 7.5: The abstract Wason selection task with an added negation.

rule,' is shown in Figure 7.6. People are first of all given a short context. In this case, they are asked to imagine that they are police officers enforcing rules. Then the rule is given which requires beer drinkers to be over 18 years of age. Now most people will check the beer drinkers and those *under 18* of years of age. This is correct as only underage drinkers can violate the rule. Experiments which give the drinking age rule find much higher rates of correct answers than with the standard abstract version.

As later authors pointed out, problems like the drinking age rule change the task from one of indicative logic (concerned with truth and falsity) to one of deontic logic, concerned with obeying rules and regulations. A number of different theoretical accounts have been offered to explain why the deontic version is so much easier. One idea is that we acquire and apply *pragmatic reasoning schemas*: rules which apply in certain contexts and can be instantiated with the content of a particular problem (Cheng & Holyoak, 1985). So people might solve the drinking age rule because they have a permission schema such as 'if an action A is to be taken, then condition C must be filled', which could be instantiated as A = drinking beer and C = 18 years or age or older. The schema tells them violations of this rule occur when the action is taken without the precondition being

filled. Other proposals included the use of innate evolved rules for social exchange (Cosmides, 1989), interpreting the problems as a decision-making task in order to maximise perceived benefits (Manktelow & Over, 1991) and the role of pragmatic relevance for different forms of conditional statement (Sperber, Cara, & Girotto, 1995).

# 7.2 Theoretical Issues in the Psychology of Deduction

Having described the main methods and typical findings in the study of deduction, I now turn to some broader theoretical issue and arguments that have arisen.

# 7.2.1 How We Reason: Rules or Models?

Despite the frequency of errors and biases, people do show some level of deductive competency on reasoning tasks, especially those of higher cognitive ability (Stanovich, 2011). For example, people endorse far more inferences that are valid than invalid on both syllogistic and conditional inference tasks. Some psychologist have focussed on the competence rather than the errors and asked questions

Figure 7.6: The Deontic selection task: Drinking age rule.

about the mechanisms by which people draw deductions. One approach is often described as mental logic but should more accurately be described as mental rule theory (Braine & O'Brien, 1998; Rips, 1994). Traditional logics are usually presented as rules, but other techniques for generating valid deductive inferences are available. The term 'mental logic' was devised to distinguish the logic inside people's heads from that in the philosopher's textbooks. The idea is that ordinary people reason by built-in logical rules. However, the psychological authors were mindful from the start of certain psychological findings. For example, in proposing a mental logic account of conditional inference, psychologists were well aware that people find Modus Ponens a lot easier than Modus Tollens. In standard logic, these would both be primitive rules of equal standing, but that cannot be the case in a mental logic.

Mental logicians have tried to address this problem by proposing only Modus Ponens is a simple rule allowing direct inference. Modus Tollens can be drawn but by an indirect reasoning procedure called *reductio* reasoning. In this kind of reasoning, one makes a supposition p and tries to show that this leads to contradiction, q and not-q. Since a contradiction cannot exist in logic, the supposition must

be false, hence not-p follows. Consider our earlier examples of a conditional which applies to cards with a letter on one side and a number on the other

If the letter is B, then the number is 3

If people are told there is a B, then they can immediately apply their built-in rule for Modus Ponens and conclude that there is a 3. When told there is not a 3, however, they do not have a rule for Modus Tollens that can be applied in the same way. Instead, they have to reason as follows:

'If I imagine there is a B on the card, then there must be a 3. But I have been told there is not a 3 which is a contradiction. So I must have been wrong to suppose that there was a B on the card. Hence, I can conclude that there is not a B.'

This indirect reasoning is harder to execute and more prone to errors, explaining the lower acceptance rate of Modus Tollens. Hence a fully specified mental logic account consists of a set of direct rules of inference together with indirect reasoning procedures and can be implemented as a working computer program (Rips, 1994).

For many years now, rule-based mental logics have had a major rival, which is mental model theory, championed by Phil Johnson-Laird (1983, 2006; Johnson-Laird & Byrne, 1991) and followed

and supported by many other psychologists. This theory also proposes that people are deductively competent in principle, but fallible in practice, but does not rely on the application of mental rules. The core of the theory is the idea that people reason about possibilities, which are states of the world that might be true or false. These possibilities are represented by mental models. When the premises of an argument are presented, people are supposed to construct mental models to represent all possibilities and then reason as follows:


Consider the inference of disjunction elimination, discussed earlier. Given the major premise Either the letter is B or the number is 3, people construct the following models as possibilities:

B

3

B 3

Each line here represents a separate mental model. So this reads as: one possibility is that there is a B, the second that there is a 3 and the third that there is both a B and a 3.

Now given the minor premise, the number is not 3, the last two models are eliminated leaving only the possibility of B. Hence, people will conclude B as a correct deduction but without having any rule of disjunction elimination for the inference. The model theory of conditionals (Johnson-Laird & Byrne, 2002) is more complex and controversial (Evans, Over, & Handley, 2005). Consider the statement

If there is a B, then there is a 3

The full set of possibilities for this statement, according to the published theory, is the following:


Given these possibilities, the minor premise B eliminates all but the first model, so the conclusion 3 follows (Modus Ponens). The minor premise not-3, eliminates all the last model, so that not-B follows (Modus Tollens). However, like the mental-logic theorists, Johnson-Laird and Byrne were well aware of the relative difficulty of Modus Tollens. So they actually proposed that people initially represent the conditional statement as follows:

[B] 3

. . .

The first model means that in all cases of B (meaning of []), there is a 3. The ellipsis '. . . ' means there are other possibilities. So if B is presented, people can immediately do Modus Ponens and conclude 3. However, if not-3 is presented, then they have to 'flesh out' the models to the fully explicit set given above in order to make Modus Tollens. Fleshing out is error prone, so people sometimes fail to make this valid inference. The theory has been applied to many other types of reasoning, including with syllogisms.

I think it fair to say that both mental rule and mental model theory are firmly rooted in the traditional deduction paradigm, as they both put an account of logical competence foremost and deal with effects of beliefs and pragmatics as add-ons. Whether belief rather than logic should be the focus of psychological accounts of reasoning is precisely the issue for researchers in the new paradigm, to which I return later. We next consider dual-process theory which is not directly concerned with how logical reasoning occurs, but rather with the idea that such reasoning competes with other kinds of cognitive processes of a more intuitive nature.

# 7.2.2 Dual-process Theories of Reasoning

In 1982, I published my first book which was a review of the psychology of deductive reasoning. Even at that time, with many of the key studies in the field yet to be conducted, it was clear that logical errors were frequent, and there was much evidence of systematic biases and belief-based reasoning. I was struck by what seemed to be two factors influencing many different reasoning tasks. People's choices were indeed influenced by the logic of the problems, just as had been originally expected, but also by non-logical factors that were completely irrelevant to the task such as atmosphere or matching bias. But two-factor theory is descriptive and provides no real explanation of the cognitive processes that underlie our observations.

An important theoretical leap is from dual sources to dual processes. What if the two factors reflect quite different mental processes? I mentioned earlier that Wason and Evans (1975) observed that when matching bias cued a correct choice on the selection task, people appeared to understand the logic of the problem and the importance of falsification. But in the standard version, they talked as though they could prove the rule true. In this early paper, we suggested a distinction between unconscious Type 1 processes responsible for the matching selections and conscious Type 2 reasoning, which simply served to rationalise or justify the unconsciously cued cards. The radical suggestion was that actual choices were determined by one kind of process but the verbal justifications by something entirely different. The next important step, in work also described earlier, was the belief-bias study of Evans et al. (1983). What we observed there was that syllogistic reasoning was influenced heavily by both the logic of the syllogism and the believability of conclusion. We showed that the two factors were in conflict and that individuals would sometimes go with logic and other times with belief.

The linkage with the Wason and Evans work was not made immediately but in retrospect the view developed was that Type 2, explicit (slow, reflective) reasoning processes were responsible for the preference for valid over invalid conclusions. However, these competed with Type 1 (fast, intuitive) processes, which favoured believable over unbelievable conclusions. So on tasks other than the selection task, at least, Type 2 reasoning could solve problems and not just rationalise intuitions. (Later research showed that Type 2 processes have a role in the selection task choices as well.) As different dual-process accounts were developed over the next quarter of a century, there was a particular emphasis on the idea that Type 2 reasoning was responsible for logically correct answers and Type 1 processing for non-logical effects, such as matching and belief bias (see Evans, 2007; Stanovich, 1999; see also Chapter 10, "Decision Making", for application of dual-process theory to decision making).

A large individual-differences programme conducted by Keith Stanovich and Richard West, who showed that on a wide variety of reasoning and decision-making tasks, cognitive ability or IQ (see Chapter 14, "Intelligence") was strongly correlated with the ability to give the correct answer. The theoretical idea here is that people with higher IQs also have higher working memory capacity and are therefore more able to manipulate mental representations of premises and conclusions in order to reason logically. In fact, the engagement of working memory is now considered a defining feature of Type 2 processing (Evans & Stanovich, 2013). Other methodologies were developed that supported this view. For example, if people are given a very short time to respond, they are less likely to give the logical answer and were more likely to show a bias, such as matching or belief. Similar results occur if a working-memory load has to be carried while reasoning (for reviews see Evans, 2007; Evans & Stanovich, 2013). However, Stanovich and West also showed that general intelligence was not the only individual difference factor in human reasoning. In particular, people vary in rational thinking disposition which measures the inclination to accept an intuitive answer or to check it out by high effort reasoning.

Some dual-process theorists have suggested that there are ruled-based (Type 2) and associative (Type 1) processes that operate in parallel (Sloman, 1996) but more popular among deductive-reasoning researchers is the idea that fast intuitive (Type 1) answers come to mind immediately and are subject to checking and possible revision by slower, reflective (Type 2) processes that follow (Evans & Stanovich, 2013; Kahneman, 2011). This leads to the important question of why it is that some intuitive answers are more carefully checked than others. A hot topic in the field right now is whether people's initial answers are accompanied by feelings of rightness, which help them decide whether to accept the intuitive answer or whether to check it out with careful reasoning (Hot Topic, see also Chapter 6, "Metacognition").

# 7.3 The New Paradigm Psychology of Reasoning

The deduction paradigm was developed about 50 years ago to assess the then-prevalent view that logic was the basis of rational reasoning. As evidence of logical errors, cognitive biases, and belief-based reasoning accumulated, this presented a clear problem both for psychologists and for philosophers who became aware of the findings of research on deduction as well as other kinds of human reasoning with similar findings. The problem was that, by the original assumptions, people were turning out to be irrational. Peter Wason, for example, was quite clearly of the view that people were illogical and therefore irrational (see Evans, 2002). In a famous paper, the philosopher Jonathan Cohen argued that people were in fact inherently rational and that psychological experiments could never prove otherwise (Cohen, 1981). He suggested that the experiments were unrepresentative or being misinterpreted. He also pointed out that standard logic, for example, is not the only kind that logicians have offered. There could be alternative normative accounts of how to be rational. A normative theory is one of how people *ought* to reason. Subsequently, a number of psychologists engaged with the issue of what counts as rational reasoning (e.g. Evans & Over, 1996; Stanovich, 1999).

A major issue is whether traditional logic provides the correct standard for human reasoning. One major research programme, that of Mike Oaksford and Nick Chater, has disputed this from the start (Oaksford & Chater, 2001, 2007). Their first important contribution was an alternative normative account of the Wason selection task, arguing that the typical answer can be seen as rational from a decision-making perspective (Oaksford & Chater, 1994). Like Cohen, they took the view that human behaviour must be rationally adapted to the environment and if a standard normative account does not explain it, then we should look for another. They have presented various theories of reasoning tasks based on probability theory and decision theory. Naturally this approach is controversial and has been branded Panglossian by some authors (Stanovich, 1999). Pangloss was a fictional philosopher in a novel by Voltaire who as prone to say 'all is for the best in the best of all possible worlds'!

The essence of the new paradigm is that people naturally reason from their beliefs about the world and that this should not be treated as an error or cognitive bias. Strong deductive reasoning instructions are artificial: they require people to ignore what they believe for the sake of the experiment. Other methods have been explored. For example, people can be asked what inference follows from some information and allowed to express degrees of belief in their conclusions. A key feature of the new paradigm is the proposal of the *suppositional conditional*, also known as the probability conditional (Evans & Over, 2004; Oaksford & Chater, 2001). The conditional statement of standard logic is equivalent to a disjunction. For example,

If the letter is B, then the number is 3

is true except when we have a B and not a 3. Hence it is equivalent in meaning to

Either the letter is not a B or the number is a 3

This is what logicians term the *material conditional*. However, many philosophers have rejected the material conditional as an account of the ordinary conditional of everyday language (Edgington, 1995). This is because it leads to unacceptable inferences. The material conditional, if p then q, is true whenever p is false or q is true. So the following statements must be true

If President Trump is French, then Paris is the capital of the USA

#### If 2+2 = 5, then 3 is a prime number

It is clear that no normal person would endorse these statements.

If the ordinary conditional is not material, then what is it? The philosopher Ramsey famously argued that belief in the ordinary conditional if p then q, is in effect the probability that q will be true if p is (Ramsey, 1931/1990). He also suggested that we do this by adding p to our current stock of beliefs and arguing about q on that basis. This is known as the Ramsey test and we suggested that conditional statements are suppositional – they depend on the supposition of p (Evans & Over, 2004). Let us consider some examples

If teachers' pay is raised, then recruitment will increase

Many people will agree with this statement or assign a high probability to it. They do this by supposing first that teacher's pay is in fact raised and then using other beliefs to calculate the likelihood that they will prove easier to recruit. They may be aware that recruitment has been difficult in recent years and that one factor is almost certainly that salary levels have fallen behind those of workers in other professions. So, they believe that financial incentive will help address the issue. In doing this, they ignore any beliefs they have about what will happen if pay is *not* increased, which they regard as irrelevant. The Ramsey test is also related to findings with Modus Ponens mentioned earlier. When people believe a conditional statement to be true, they also believe that q is probable when p is assumed and so will readily infer q from p. Consider, however, this statement

If the global economy grows then there will be less poverty and starvation in the world

The Ramsey test will not produce a high level of confidence in this conditional for many people. They may believe, for example, that the growth in the global economy increases wealth for rich individuals and rich countries but is not likely to be distributed to the third world. where most of the poverty and starvation is concentrated. If their belief in the conditional is low, they will also be reluctant to draw inferences from it, even the apparently obvious Modus Ponens. There is now much evidence that, with real-life conditionals like these, people do indeed assign very similar belief levels to if p then q as they do to the probability of q given p (e.g. Over, Hadjichristidis, Evans, Handley, & Sloman, 2007). People act as though the conditional only applies on the supposition that p is true, in people's minds, and is otherwise irrelevant. This is not consistent at all with the material conditional of standard logic.

The essence of the new paradigm is to view beliefbased reasoning as natural and rational, rather than to consider it necessarily a source of bias. The new paradigm is, however, not yet as clearly defined as the old. A lot of authors are pursing alternative normative theories of reasoning to standard logic or seeking explanations in terms of Bayesian decision theory – a system which takes account of subjective beliefs. Others have argued that the new paradigm should concern itself less with normative theory and more with describing what people actually do when they reason. One thing that all agree upon is that the traditional standard logic is neither a good account of how we actually reason, nor of how we ought to reason.

#### Summary


#### Review Questions


Hot Topic: Feelings of rightness in reasoning and implications for dualprocess theory

Jonathan Evans

The standard dual-process approach assumes that an intuitive answer to reasoning (and decision making) problems comes to mind quickly and rapidly due to Type 1 processing. It is then subject to checking by Type 2 processes, which may rationalise the intuitive answer or substitute more complex reasoning to provide a different answer. An important question is why we sometimes rely on the initial intuition with minimal reasoning and other times engage Type 2 processes. Some known factors are the processing style of the individual and time available for thinking. However, an additional factor has been proposed by Valerie Thompson and her colleagues – metacognitive feelings. The initial intuition comes with a *feeling of rightness (FOR)*, which could determine whether we accept it or expend effort on reasoning (Thompson,

Prowse Turner, & Pennycook, 2011). Thompson has claimed in several papers that FOR is the key factor in determining whether extensive Type 2 processing occurs.

Thompson invented a methodology called the two-response task (Thompson et al., 2011). Participants are given a reasoning or decision task and asked to generate an initial answer as quickly as possible without reflecting on it. They then rate the degree to which they are confident that the answer is correct – the FOR. Following this, they are then asked to think again about the problem, taking as long as they like. After this, they again give a response to the problem which may or may not be the same as first answer. Using a range of different tasks, the following pattern was established: when FOR is high, the initial answer tends to be given quickly, rethinking time tends to be short and second response usually matches the first. In other words, when we are intuitively convinced of our original answer, we expend little effort in trying to check or correct it. Conversely, when FOR is low, we are more likely to take time rethinking the problem and to change our original answer.

There are some unresolved difficulties with this account. First, none of the studies on this report a relation between FOR and actual accuracy. We are no more likely to be confident of a correct answer than a biased one and we are just as likely to change a right answer to a wrong one after reflection as the other way around. So if FOR has evolved to help us make good decisions, using our cognitive resources effectively, why is it not helping us identify errors? In fact, the opposite seems to be true. Matching bias and belief bias, for example, have been shown to be supported by false feelings of rightness. Matching cards and believable conclusions *feel* right, even when they are wrong. Another difficulty is that there is no direct evidence for a casual connection between FOR and Type 2 reasoning as Thompson and colleagues claim. Everything is in fact correlational. All we really know is that answers with high FOR are also made more quickly, thought about less and changed less often.

There are also some recent empirical findings which raise difficulties for the dual-process story here. On relatively simple tasks, when a correct choice is put in conflict with a bias, there is evidence that this conflict is detected very rapidly by the brain, indicating that some kind of 'logical intuition' (De Neys, 2014) is available to conflict with the bias. Recently, using the two-response task with syllogistic reasoning, Bago and De Neys (2017) showed that most people who are correct at time 2 were also correct at time 1. There was little evidence for people correcting intuitive responses by a period of reflection, as might be expected with Type 2 intervention. However, it is possible that studies of this kind provide a misleading impression, as the tasks are relatively simple. Hence, it could be that 'logical intuitions' arise from Type 1 rather than Type 2 processing as these solutions do not require the engagement of working memory on the tasks used (Evans, 2018). As is the nature of hot topics, there is as yet no clear resolutions to the questions I have raised, and research continues.

#### References


Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. *Cognitive Psychology*, *63*(3), 107–140. doi:10.1016/j.cogpsych.2011.06.001

### References


ference. *Psychological Review*, *109*(4), 646–678. doi:10.1037//0033-295x.109.4.646


#### Evans Glossary

# Glossary


# Chapter 8

# Inductive Reasoning

#### JANET E. DAVIDSON

Lewis & Clark College

Inductive reasoning involves inferring likely conclusions from observations or other forms of evidence. For example, if your car starts making loud clunking noises, you might conclude that it probably has a serious and expensive problem. Without reasoning in this manner, the world would be a more primitive and confusing place. It would be impossible to develop scientific theories, forecast the weather, persuasively argue legal cases, learn from mistakes, make predictions about how a best friend will behave around strangers, use previous experiences to help transition to a new job, or think critically before making important decisions. Furthermore, people would probably be less creative (Goswami, 2011; Vartanian, Martindale, & Kwiatkowski, 2003). It even takes inductive reasoning to predict how life would be different without it. In short, this form of reasoning expands and deepens world knowledge, helps social interactions, and allows us to adapt to new environments (Heit, 2000; Rhodes, Brickman, & Gelman, 2008).

As implied above, induction is viewed as the most common form of reasoning that people use in their daily lives (Hayes & Heit, 2017). It can occur so automatically that we are often unaware of quickly using examples, observations, and other existing knowledge to draw conclusions or make predictions about the future. It might seem odd that we frequently rely on inductive reasoning, given that the conclusions from it are never guaranteed to be correct. The data on which they are based is always po-

tentially incomplete or perhaps even flawed, which means the conclusions at their very best can only have a high probability of being right. There is always the chance that they will need to be modified or even rescinded in the future after new evidence is acquired.

Although induction never provides definitive answers, we habitually use it for two primary reasons related to being human. One, we categorize, infer causality, and reason by analogy in an attempt to explain and manage almost everything that happens around us (Feeney & Heit, 2007). For example, people who live near different kinds of dogs might use size, breed, and amount of barking to infer which ones will be friendly and which ones to avoid. Second, we reduce our uneasiness about an uncertain future by using past experiences to predict upcoming events or outcomes (Murphy & Ross, 2007). For example, if you take a new course from a professor who taught two of your other courses, you might reasonably assume that the same study skills that worked well for you in the other two courses will work well in this one.

Why is it important to learn about inductive reasoning? Knowledge about this topic provides us with insight into how humans use limited data to make rational inferences and how, across our lifespans, we generalize from the known to the unknown (Hayes & Heit, 2017). On a personal basis, it helps us learn how to construct strong persuasive arguments that could convince others to adopt our point

of view (Jones, 2013). Additionally, induction is a major component of other fundamental cognitive activities, such as decision-making, categorization, and similarity judgments (Heit, 2007). In other words, we cannot comprehend the full-range of how humans think and behave if we do not understand when and how inductive reasoning is performed.

Given that inductive reasoning is a central part of being human, it has been examined through a wide variety of approaches. If you are a philosopher like David Hume (1739), you might conclude that inductions are a questionable habit of the mind because using past experiences to predict an unknown future is not logically justifiable. In other words, it is not rational to assume that life twenty years from now will closely resemble the world today. Furthermore, according to Goodman's new riddle of induction, if there is more than one possible future, it is not clear how best to distinguish which one to select (Goodman, 1983). Goodman's answer to his own riddle is that we make projections that are entrenched or well established simply because they are familiar and may have worked in the past. However, there are several psychological approaches to inductive reasoning that are data-driven and examine induction through a wide range of problems, methodologies, models, and developmental perspectives. This rich collection of research has increased our knowledge about the cognitive processes used to reach probabilistic conclusions and how these processes and their regularities relate to other forms of thinking and problem solving (Heit, 2007).

This chapter is divided into five sections. The first compares induction with deduction, the other commonly used type of reasoning. It then examines the attributes that help create strong inductive arguments, followed by descriptions of some different forms of induction. The fourth section reviews how inductive reasoning develops in children. Finally, the chapter's main points are summarized.

# 8.1 Comparing Inductive Reasoning with Deductive Reasoning

In the past, inductive reasoning has primarily been understood by contrasting it with deduction (Heit, 2007; see Chapter 7 for an in-depth review of deductive reasoning). Inductive reasoning is sometimes described as "bottom-up" logic because specific observations are often used to draw general conclusions or principles that explain the evidence (Johnson-Laird, 2000). For example, after observing that students who show up on time for my classes tend to perform better than ones who arrive late, I might induce that effective time management is a crucial component of academic success. In contrast, deductive reasoning is sometimes defined as the opposite because it often uses "top-down" logic to reason from general principles to derive specific conclusions (Johnson-Laird, 2000). For example, given the premises that every first-year student at my small college must live on campus and Brenda is a first-year student, I deduce that she lives on campus, which means I know how to track her down.


Table 8.1: Key differences between inductive and deductive reasoning.

However, it is important to note that recently the distinction between bottom-up and top-down processing is viewed by some researchers as too simplistic because it does not apply to all cases of induction and deduction (Feeny & Heit, 2007). For example, inductive reasoning sometimes results in specific conclusions. Consider the following problem: "It rained in Seattle yesterday. It rained in Seattle today. Will it rain in Seattle tomorrow?" This scenario requires the solver to determine a probabilistic conclusion that is specific rather than general. Furthermore, there are problems that some people solve inductively and others solve deductively, which means that problem type cannot be used to determine which form of reasoning is being used. Consider, for example, the task of buying a new car. One individual might observe which models are commonly or rarely found in car repair shops before inducing which type of car seems to need the least amount of mechanical work. Another individual might use the premises that "All cars made in Japan are good cars and Toyotas are made in Japan" to deduce that a Toyota is a good car that would be worth buying. Evidence that some problems can be solved either inductively or deductively has resulted in the process view of reasoning (Heit, 2007). Instead of the traditional procedure of using type of problem to determine the form of reasoning being applied, the focus is on the mental processes that each individual employs.

What are the current key differences between inductive and deductive reasoning? Unlike induction, deduction can be independent of external knowledge of the world or it may even contradict such knowledge (Goswami, 2011). The conclusion is completely derivable from the premises and additional information is not required. Consider, for example, the following problem: "All dogs fly. Fido is a dog. Does Fido fly?" The correct answer that Fido flies is unsound and counterfactual but logically valid because the premises, despite being false, require the conclusion to be logically true. If the problem had been "No dogs fly. Fido is a dog. Does Fido fly?" the correct answer that Fido does not fly is sound because it is logically valid and the premises, in reality, are true. In contrast, inductive arguments are dependent on world knowledge rather than on formal rules of logic and they are viewed on

a continuum of weak to strong, rather than on the dichotomy of logically valid or invalid (Foresman, Fosl, & Watson, 2017). Extremely weak arguments have such little support that conclusions drawn from them are quite unlikely to be true. Ones that are quite strong are based on relevant, substantial, and compelling evidence. However, even if the premises or arguments are accurate and convincing, we cannot know that new information will be not be found that will overturn our earlier conclusions. In other words, inductive conclusions can be thought of as educated guesses based on our current knowledge. Deductively valid conclusions are not guesses; they are guaranteed to be logically true. Furthermore, inductive reasoning tends to happen more quickly, intuitively, and automatically than deductive reasoning, which often requires more conscious, analytical processing (Heit & Rotello, 2010). Some research has also found more activation in the brain's left prefrontal cortex during inductive reasoning than during deductive reasoning (Hayes, Heit, & Swendsen, 2010). Table 8.1 summarizes key differences between these two forms of reasoning.

There are several similarities between induction and deduction that deserve recognition. For example, both involve evidence, logic, working memory, and are central to critical thinking (Foresman et al., 2017; Süß, Oberauer, Wittmann, Wilhelm, Schulze, 2002). Both are used in the scientific method. As John Steinbeck (1954) describes,

Everyone knows about Newton's apple. Charles Darwin and his *Origin of Species* flashed complete in one second, and he spent the rest of his life backing it up; and the theory of relativity occurred to Einstein in the time it takes to clap your hands. This is the greatest mystery of the human mind–the inductive leap. Everything falls into place, irrelevancies relate, dissonance becomes harmony, and nonsense wears a crown of meaning. (p. 20)

Although Steinbeck undoubtedly over-estimated the frequency of true inductive leaps, inductive reasoning is used to form hypotheses and theories that advance scientific knowledge. However, this is not nearly enough; scientists then need to use deductive

reasoning to test their hypotheses and theories on specific situations in order to verify their accuracy. In addition, both forms of reasoning are continuous across our lifespans and are susceptible to similar heuristics and biases (Goswami, 2011).

# 8.2 Inductive Reasoning at Its Best

As noted earlier, we can never be 100% certain that our inductive conclusions are right. However, by keeping the following attributes in mind, we can reduce errors and biases, which increases the likelihood that our inductive arguments are strong and the conclusions are warranted, justifiable, and have a high probability of being true.

# 8.2.1 A Sizeable Sample

A large number of observations typically increases the strength of inductive arguments, which makes the conclusions more likely to be accurate (Nisbett, Krantz, Jepson, & Kundra, 1983). Consider the following two examples:

*Observations*: Every morning for the past 8 months, George drank a large glass of milk and thirty minutes later his stomach consistently started hurting.

*Conclusion*: George has lactose intolerance.

*Observation*: Natalie ate a peanut and then had trouble breathing.

*Conclusion*: Natalie has a peanut allergy.

The first example has a stronger argument than the second because it is based on approximately 250 observations or pieces of evidence rather than only one. Although there are exceptions (Osherson, Smith, Wilkie, Lopez, & Shafir, 1990), a large sample size helps maximize information, reduces distortions in the evidence, and makes the conclusions more likely to be correct (McDonald, Samuels, & Rispoli, 1996; Nisbett et al., 1983).

The importance of a large sample is highly relevant to both scientific research and inductive reasoning in our daily lives. Given that there are individual differences in human behavior, psychological research, in particular, needs to have a large number of participants and numerous experimental trials or survey questions in order for the data to be robust and trustworthy. Non-scientists have also been found to pay attention to number of observations, especially when making inductions about highly variable attributes. For example, Nisbett and his colleagues (1983) asked college students to estimate the percentage of obese male members of the Barratos tribe if they observed one obese tribesman, three obese tribesmen, or twenty of them. Results showed that participants were least likely to make strong inferences based on only one tribe member; conclusions were strongest for the highest number of observations. This finding is known as premise monotonicity, which means that a higher number of inclusive premises results in a stronger inductive argument than a smaller number (Osherson et al., 1990). However, as will be explained in the section on representativeness, individuals do not always take sample size into account as much as they should.

# 8.2.2 Diverse Evidence

Although the milk example presented earlier involves numerous observations, the conclusion that George has lactose intolerance would have a higher probability of being true if it were based on a wide range of evidence, such as George getting stomach aches after consuming other lactose-based foods, George's health history, and observations conducted at different times of the day and night. In other words, inductive arguments are stronger if they present a range of converging evidence taken from different sources (Heit, 2000). For this reason, scientific experiments are often conducted in various ways using different types of participants in order to test a single hypothesis. For example, inductive reasoning has been studied using objects, cartoon pictures, complex verbal arguments, computational models, and participants of different ages from various backgrounds (Heit, 2007).

Given that it would be time consuming and often impossible to collect every possible observation, people often use shortcuts or heuristics to reach inductive conclusions (see Chapter 10,"Decision Making", for more information about heuristics.). In many cases these heuristics can result in quick and highly probable conclusions. Unfortunately, sometimes they can cause errors. One of these heuristics or "rules of thumb" is the availability heuristic, which can undermine our diversity of evidence. We tend to use information that easily comes to mind, without also considering a significant number of cases that take longer to retrieve from memory. For example, when Amos Tversky and Daniel Kahneman (1973) asked people to predict whether there are more words in the English language beginning with the letter R or more words with R as the third letter, 69% of their participants erroneously predicted that more words begin with R. In other words, it is easier to generate words like "rutabaga", "rat", and "ridiculous" than it is to think of instances like "bard", "certify", and "dare." According to Tversky and Kahneman, "to assess availability it is not necessary to perform the actual operations of retrieval or construction. It suffices to assess the ease with which these operations can be performed" (p. 208). However, other researchers believe information must be retrieved from memory because it is used to guide and evaluate inductive inferences (Shafto, Coley, & Vitkin, 2007).

Moreover, the availability heuristic applies when people easily retrieve information from memory that indicates there is a relationship between events, categories, or attributes. They then base their inductive conclusions on this perceived correlation, which can often be quite useful. For example, if you remember that your professor has granted requests for paper extensions when he or she is in a good mood, you might take this information into account when you want permission to turn your paper in late. However, the availability heuristic can sometimes result in an illusory correlation, which means that people believe a relationship exists when, in reality, it does not (Hamilton & Lickel, 2000). For example, prejudicial conclusions are sometimes drawn when individuals have information readily available in memory that leads them to believe there is a correlation between

negative personality traits and a particular group of people. In short, making predictions based only on easily retrievable evidence can result in wrongly assuming correlations exist, which lowers the strength of our arguments and reduces the likelihood that our inductive conclusions are correct.

Having a range of evidence, if it is chosen correctly, can also help prevent confirmation bias. We have a tendency selectively to seek data that supports our hypotheses, while overlooking information that would invalidate them. Suppose someone gave you the numbers 2, 4, and 6 that conform to a rule and asked you to discover the rule by generating sets of three numbers you think would fit. What do you think the rule is and which three numbers would you select to test your hypothesis? When Peter Wason (1960) gave this task to adults, his nonobvious rule was "three numbers in increasing order of magnitude" but most participants assumed the rule was "increasing intervals of two". Box 8.1 shows his instructions and examples of different ways of responding. Thirty-one percent of the participants practiced what Wason refers to as enumerative induction; they did not try to disconfirm their hypothesis by testing odd numbers or descending ones. As a result, they never discovered the correct rule. In science and in everyday life, it is essential to practice eliminative induction by seeking both confirming and disconfirming evidence before drawing conclusions.

The availability of different types of knowledge to inform our inductive reasoning is dynamic; it can change based on context and effects of prior experience (Shafto et al., 2007). More specifically, the information in the premises of an inductive problem can have an immediate consequence for which knowledge we retrieve from memory in order to make our generalizations. If we are told that dogs have a recently discovered illness, we might infer that cats will get it too because we remember they often live in the same households. However, if we discover that dogs have a recently discovered gene, we would be more likely to conclude that wolves also carry it because we remember that the two species are genetically closely related. In contrast, prior experience has long-term consequences for knowledge availability. For example, novices in a domain

are more likely to retrieve taxonomic or categorical information than are domain experts, who tend to rely more on causal, thematic, and ecological relationships. Interestingly, if put under time pressure, experts often fall back on using taxonomic similarity to draw their conclusions (Shafto, Coley, & Baldwin, 2007).

# Textbox 8.1: Examples of Enumerative and Eliminative Induction on the 2-4-6 task

#### Instructions

You will be given three numbers which conform to a simple rule that I have in mind. This rule is concerned with a relation between any three numbers and not with their absolute magnitude. . . . Your aim is to discover this rule by writing down sets of three numbers, together with reasons for your choice of them. After you have written down each set, I shall tell you whether your numbers conform to the rule or not. . . . There is no time limit but you should try to discover this rule by citing the minimum sets of numbers. Remember that your aim is not simply to find numbers which conform to the rule, but to discover the rule itself. When you feel highly confident that you have discovered it, *and not before*, you are to write it down and tell me what it is (Wason, 1960, p. 131).


### 8.2.3 Representative Observations

To achieve strong inductive arguments, it is not enough to have several observations that include diverse evidence. The observations must also fully

represent the entire population or category of interest. For example, suppose you wanted to predict whether the citizens of California believe the legal drinking age should be lowered from age 21 to 18. Polling undergraduates at several universities in California would probably tell you more about college students than it would about the beliefs of people in the entire state. To draw potentially accurate conclusions about drinking attitudes in California, it would be important to obtain opinions from a representative cross-section of the people who live there. Similarly, results from scientific studies can only be safely generalized to the population represented in the sample of participants. Experiments conducted on mice, college students, males, or individuals from a specific culture are often replicated using members of other populations so that the conclusions can encompass other species or all humans.

In solving inductive reasoning problems, individuals often use the representativeness heuristic. When trying to estimate the probability of an event, this short cut involves finding a comparable case or prototype and assuming that the two events have similar probabilities. Consider a problem developed by Tversky and Kahneman (1974): "Steve is very shy and withdrawn, invariably helpful, but with little interest in people, or in the world of reality" (p. 1124). Is Steve more likely to be a farmer, librarian, salesman, airline pilot, or physician? Using the representativeness heuristic, people are likely to respond that Steve has the highest probability of being a librarian because he best fits how they view a typical librarian. However, this conclusion can be inaccurate if important base rate information is not taken into account. At the time the study was conducted, there were more male farmers than male librarians in the United States.

In addition, people do not always take small sample sizes into account to assess representativeness. One demonstration of this is another study conducted by Tversky and Kahneman (1974). Ninetyfive participants were asked the following question:

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50 percent of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50 percent, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60 percent of the

babies born were boys. Which hospital do you think recorded more such days? (p. 1125)

The answer options were (1) the larger hospital, (2) the smaller hospital, or (3) about the same (within 5 percent of each other) for the two hospitals. Over half of the participants predicted the recordings would be about the same, presumably because they assumed that both hospitals would be equally representative of male and female birth rates in the general population. However, the correct answer is the smaller hospital because about 15 babies born each day will show more fluctuation in the number of males and females born than will the bigger sample size at the larger hospital, which is more likely to reflect the statistic found in the general population.

# 8.3 Different Forms of Inductive Reasoning

Given that induction is central to our daily lives as we engage in a variety of activities, it is not surprising that there are different ways we use it. Four of these will be covered in this section.

# 8.3.1 Category-based Induction

Category-based induction has probably been studied more than any other form of inductive reasoning (Heit, 2007). In this type of reasoning, if people are told that one or more members of a category have a certain property, they then determine whether other members of the category are likely to have the same property. For example, if you observe that chimpanzees groom each other, you would probably infer that gorillas have the same behavior. Would you also conclude that groundhogs groom each other?

#### 8.3.1.1 Premise Typicality

In a classic study conducted by Lance Rips (1975), participants were told that a particular species on an isolated island had a new contagious disease and then asked to estimate the likelihood that other kinds of animals on the island would contract the disease. Results indicated that species' typicality had a large influence on individuals' inductive judgments, even

when similarity was held constant. In other words, if one species (e.g., a robin) is highly representative of an inferred superordinate category (e.g., birds), individuals were more likely to generalize to other members (e.g., sparrows) than if the same information was given about an atypical member (e.g., canary). It is more convincing to project the robins' disease onto sparrows than it is to generalize the disease from canaries to sparrows.

There is also premise-conclusion asymmetry, which means a single-premise argument is viewed as stronger if the more typical member of an inferred superordinate category is used in the premise rather than in the conclusion. For example, it is more convincing to project a property of lions onto bats than the other way around because lions are viewed as a better prototype of mammals than are bats (Smith, Shafer, & Osherson, 1993).

#### 8.3.1.2 Category Similarity

Two categories are highly similar if they have several features in common and few distinctive ones they do not share. Perceived similarity between the premise category and the conclusion category strengthens inductive arguments and increases the likelihood that a novel property of one category will be generalized to another category (Hayes & Heit, 2017). For example, individuals are more likely to generalize a property from lions to wolves than from hippopotamuses to giraffes.

The similarity-coverage model (Osherson et al., 1990) posits that individuals automatically compute similarity and make inductive generalizations when (a) there is a great deal of overlap between the features of the premise and conclusion categories and (b) there is substantial similarity between premise features and the inferred superordinate category (e.g., mammals) that is inclusive of the premises and conclusion. This model is predictive of the premise-conclusion similarity effect found in many studies of category-based induction (Hayes & Heit, 2017). It can also account for the premise typicality results mentioned earlier. Typical premises have higher mean similarity to the inferred superordinate category than do atypical ones, which means that typicality provides better coverage.

### 8.3.1.3 Premise Diversity

After typicality and premise-conclusion similarity, probably the next most important attribute to consider is diversity of the actual premises. Other things being equal, arguments are stronger and conclusions are more probable if dissimilar subordinate categories are used as evidence (Smith et al., 1993). For example, if given the information that mice and lions share the same property, it is more likely that we will predict that elephants and other mammals also have the property than if we are told that cougars and lions share the property. Similarly, as mentioned earlier, premise monotonicity increases the amount of evidence and typically strengthens inductive arguments (Osherson et al., 1990). For example, information that mice, lions, bears, dog, and horses share a property is stronger evidence than knowing only about mice and lions.

The similarity-coverage model (Osherson et al., 1990) mentioned above accounts for this diversity effect; less similar subordinate categories tend to provide more coverage of the inferred superordinate category (e.g., mammals) than do subordinate categories that are quite similar. In the same way, premise monotonicity provides more coverage of the inferred superordinate category and makes a property more likely to be generalized.

Premise typicality, category similarity, premise diversity, and premise monotonicity involve taxonomic relationships between premises and conclusions. As noted earlier, novices in a domain are more likely than experts to be influenced by these taxonomic relations (Hayes & Heit, 2017). Experts tend to rely instead on thematic, causal, and ecological relations for their generalizations of properties related to their domain of expertise. For example, when tree experts were asked to infer which of two novel diseases would be most likely to affect all trees, they focused on causal-ecological factors related to how tree diseases work and "local coverage", which involves extending the property to other members of the same folk family. In other words, they were not very influenced by typicality and diversity of the premises (Proffitt, Coley, & Medin, 2000).

# 8.3.2 Causal Induction

Predicting what causes certain events and outcomes is an important part of being human. This form of reasoning is commonly used in both science and in our daily lives to advance knowledge and give us a sense of control. For example, predicting that not going to class or turning in the work will result in a failing grade can motivate students to attend and finish assignments. However, poorly executed causal reasoning can result in superstitions, such as believing that breaking a mirror causes seven years of bad luck.

Causal relations are so important to us that they typically outweigh other information. Even for nonexperts, for example, the presence of a causal relation can over-ride taxonomic ones, such as premise typicality, category similarity, premise diversity, and premise monotonicity. In a demonstration of causality's strong influence (Rehder, 2006), participants were given a novel category (e.g., Kehoe ants) and told characteristic features of its members (e.g., their blood has high amounts of iron sulfate). Participants were then told about a novel property possessed by one of the category members (e.g., it has a venom that gives it a stinging bite) and asked to estimate the proportion of all category members that also possessed this new property. In some conditions, participants were told that the new property was caused by a characteristic feature they had previously learned (e.g., the stinging bite is caused by the high amounts of iron sulfate in its blood). When causal explanations were present, the standard effect of typicality was almost completely eliminated. Additional experiments demonstrated that causal explanations also

drastically reduced the effects of premise typicality, diversity, and similarity.

John Stuart Mill (1843) was one of the first to propose a theory of causality and it includes five methods (or canons) of causal analysis that focus on the observation of patterns. Four of the five involve inductive reasoning and each of these is paraphrased and briefly described below. The first three help people practice Wason's (1960) notion of eliminative induction; ruling out some possible causes helps narrow the hypotheses for what actually is the cause.



Table 8.2: Method of agreement indicating fish as the source of illness.


Table 8.3: Method of disagreement indicating pie as the source of iIllness.


likely to be the cause of the other or a third unknown variable might be causing the variation in both. Suppose you did not eat berry pie at the buffet, your mom had half a piece, your brother had a whole one, and your dad ate five pieces. You feel fine later that night, your mom feels a bit queasy, your brother is moderately sick, and your poor dad needs to be rushed to the hospital. A highly probable conclusion to infer from this evidence is that suffering from the effect (i.e., food poisoning) is proportional to the cause (i.e., the amount of pie consumed).

Mill's methods provide useful tools for finding potential reasons for effects but they are limited to what we choose to focus on. Potential causes will not be observed and found unless we already have relevant hypotheses about what the causes are likely to be. For example, in discovering the source of food poisoning, factors other than a buffet dinner might be involved.

Miriam Schustack and Robert Sternberg (1981) examined what sources of information people actually use when making causal inferences about un-


Table 8.4: Joint Method of agreement and disagreement indicating salad as the source of illness.

certain and complicated situations. For example, participants were given information about various cosmetic companies, including the facts that (a) a company did or did not have a major product under suspicion as a carcinogen and (b) the company's stock had or had not drastically dropped. Participants were then asked to infer the probability that some other cosmetic company would have its stock values drop drastically if it had a major product under suspicion as a carcinogen. Overall results indicated that people confirm a causal relationship in one of two ways. One, which is related to Mill's method of agreement, is based on evidence of the joint presence of the hypothesized cause (e.g., suspicion of a carcinogen) and effect (e.g., declining stock values). The other, which is related to Mill's method of disagreement, is based on evidence of the joint absence of the hypothesized cause and effect. Overall results also indicated that people disconfirm causality in one of two ways. The first focuses on the presence of the hypothesized cause but the absence of the outcome and the other is based on the absence of the cause, yet the outcome still occurs. These overall findings are supported by Rehder's (2006) result that an effect is viewed as more prevalent if the cause is also prevalent.

#### 8.3.3 Analogical Reasoning

Even though many people hate answering analogy problems on standardized tests, this form of induction allows us to use familiar knowledge to understand something we do not know. For example, learning the structure of an atom might be easier if it is compared to already acquired knowledge about the solar system. The sun and its orbiting planets can help us comprehend the atom's nucleus and the electrons that move about it. Analogical reasoning can also cause us to consider familiar material in new ways. When Kevin Dunbar and his colleagues (Dunbar, 1995; Dunbar & Blanchette, 2001) videotaped immunologists and molecular biologists during their lab meetings, they discovered that the scientists frequently used analogies 3 to 15 times in any given meeting as an important source of knowledge and conceptual change. For example, when discussing the flagellar pocket, a postdoctoral

fellow said, "Things get in, but things. . . It's like the Hotel California - you can check in but you can never check out" (Dunbar, 1995, p. 383).

As implied above, analogical reasoning typically works by comparing two domains of knowledge in order to infer a quality they have in common. The first domain is often the more familiar of the two and it serves as the base or source. It provides a model for understanding and drawing inferences about the target, which is often the more novel or abstract domain (Gentner & Smith, 2012).

Robert Sternberg (1977) used simple picture, verbal, and geometric analogies to determine the components of analogical reasoning (Figure 8.1 shows examples similar to the ones he used). Consider the following verbal problem, which involves choosing the best option for the end of the analogy.

A lawyer is to a trial as a surgeon is to:

(a) a stethoscope, (b) medical school, (c) an operation, (d) patients.

The successful analogy solver encodes the first two terms of the base (i.e., lawyer and trial), which includes forming an appropriate mental representation of them in memory. Next one or more relations between these two items are inferred (e.g., lawyers present their cases during a trial). The term 'surgeon' is then encoded and an overall relation is mapped between a lawyer and a doctor (e.g., they are both practicing professionals). This is followed by applying the relation in the base to the target. Finally, a response is prepared and given (i.e., operation is the correct answer because surgeons perform their procedures during an operation and lawyers perform theirs during a trial.)

Using mathematical modeling, Sternberg (1977) analyzed the amounts of time participants spent on the components of analogical reasoning mentioned above: encoding, inference, mapping, application of the relation, and preparation-response. Interestingly, he found that participants spent quite a bit more time on encoding and preparation-response than inference, mapping, and application. Furthermore, for all three types of analogies (picture, verbal, and geometric), the preparation-response component was the one most highly correlated with standardized tests of reasoning.

Figure 8.1: Examples of different types of analogies.

In later work, Sternberg identified higher-order components (metacomponents) that are used successfully to plan, evaluate, and monitor strategies and solutions for analogies and other problems. For example, some individuals do not form a connection between the first and second halves of an analogy because they do not select the lower-level component of mapping (Sternberg & Rivkin, 1979). Other individuals might not select the best strategy for combining lower-level components and they end up using an inefficient search strategy for inferring relations between the first two terms in an analogy (Sternberg & Keatron, 1982). Not surprisingly, Sternberg's work on analogical reasoning plays an influential role in his triarchic theory of intelligence (1988) and his theory of successful intelligence (1997).

As Sternberg's work indicates, the elements in an analogy need to be linked by a relation they have in common. In other words, relational (or struc-

tural) similarity is a basic constraint of this form of reasoning (Goswami, 2011). Surface similarity is not required; objects in each domain do not need to resemble each other physically or have the same behaviors. For example, computers and humans do not look or act alike but they are relationally similar in terms of information processing. However, surface similarities can facilitate the mapping of relations and improve performance (Gentner, 1989; Holyoak & Koh, 1987).

Interestingly, analogical reasoning is not always done consciously and deliberately. For example, Blanchette and Dunbar (2002) had participants read descriptions of a target topic (e.g., legalization of marijuana) and then read shorter information about a potential analogical base (e.g., prohibition). Afterwards, when participants were given a recognition test, they erroneously believed that their analogical inferences were concrete facts actually presented in the target description. In other words, they unconsciously inserted their inferences into their mental representations of the target domain.

# 8.3.4 Insight

Sudden insight into the solution of a seemingly impenetrable problem is another form of inductive reasoning (Goswani, 2011) or what Steinbeck (1954) referred to as the inductive leap. The Gestalt psychologists paved the way for later research on this creative and productive way of thinking, which occurs when an individual goes beyond old associations and suddenly views information in a new way (see Chapter 9, "Problem Solving", for more information about Gestalt theory and insight). A novel solution and a subjective feeling of "aha", or suddenly knowing something new without consciously processing how one arrived at it, often accompany this new perception of the situation (Topolinski & Reber, 2010). In contrast, an analytic process involves consciously and systematically evaluating the problem, using strategic thinking and deduction.

In support of this view of insight, Janet Metcalfe (1986a, 1986b; Metcalfe & Weibe, 1987) and Davidson (1995) found that incremental increases in feelings of confidence (or warmth) that one is nearing a solution negatively predict correct solution of insight problems but positively predict correct solution of deductive reasoning problems. In other words, individuals who felt they were gradually getting closer to solving insight problems tended to arrive at incorrect solutions; others who thought they were far from solving the insight problems and then suddenly realized the answers tended to be accurate. Metcalfe concludes that insight is a subjectively catastrophic process, not an incremental one.

An important source of insight involves cognitive restructuring of a problem's components, which can occur multiple times as an individual moves from general to specific mental representations of a problem (Mayer, 1995). Unlike routine or welldefined problems, ill-defined or non-routine ones are more likely to require individuals to search through a space of alternative approaches (Newell, & Simon, 1972) because the givens, goals, and obstacles are not clear. However, it should be emphasized that insight is process rather than problem oriented. One

individual may solve a problem by having an inductive leap; another person may solve the same problem incrementally and consciously, especially if it is familiar (Davidson, 1995; Webb, Little, & Cropper, 2016).

The Gestalt psychologists believed that people's inability to restructure a problem's components and produce an insightful solution is often due to their fixation on past experience and associations. For example, in what is now seen as a classic insight problem, Karl Duncker (1945) gave individuals three small cardboard boxes, candles, matches, and thumbtacks. The participants' task was to mount a candle vertically on a screen so that it could be used as a reading light. The solution is to light a candle, melt wax onto the top of a box, stick the candle into the wax, and tack the box to the screen. Participants who were given boxes filled with tacks, matches, and candles had much more difficulty solving the problem than did those who received the same supplies outside of the boxes. According to Duncker, seeing a box serve the typical function of a container made it difficult for many individuals also to view the box as a structural support. This phenomenon became known as functional fixedness.

Similar types of mental blocks can interfere with insightful problem solving. In particular, even when we realize that we are approaching a problem incorrectly, we often cannot break our fixation on this approach in order to change our strategies or search for new evidence. Fortunately, taking a break when we reach an impasse often allows us to stop this fixation and see material in a new way when we return to it (Davidson, 2003).

# 8.4 How Does Inductive Reasoning Develop?

Young children have limited knowledge about the world and they have a lot to learn in a relatively short amount of time in order to adapt well to their environments. Inductive reasoning allows them to acquire new information and fill in gaps in their knowledge. Not surprisingly, research shows this form of reasoning appears early in development. For example, infants between 9-16 months of age make

inductive inferences based on perceptual similarities of objects, expecting new ones to wail when squeezed if they physically resemble a previously squeezed one that wailed (Baldwin, Markman, & Melartin, 1993). Although inductive reasoning is relatively continuous across the human lifespan, it becomes more complex as children's cognitive skills, experience, and knowledge base expand and they become better able to evaluate and apply evidence to draw likely conclusions (Goswami, 2011; Hayes, 2007).

# 8.4.1 How Children Use Inductive Evidence

#### 8.4.1.1 Sample size

Do children, like adults, take sample size into account when making inductive generalizations? Evidence indicates that they do if the tasks are made simple enough. Grant Gutheil and Susan Gelman (1997) asked 8-10 year old children to make inductions based on small and large samples of observable features. For example, children were shown a picture of one butterfly and told that it has blue eyes. They were also shown a picture of five butterflies and told that all of these butterflies have gray eyes. The experimenter then looked at a picture but did not show it to the children and asked whether they thought the butterfly in the picture has blue eyes or gray eyes. The children were significantly more likely to generalize traits, such as eye color, from the large sample than from the small one.

Similarly, it has been found that children younger than age 6 take number of observations into account for their inductive generalizations if the task involves only one sample of evidence (Jacobs & Narloch, 2001; Lawson & Fisher, 2011). If they need to compare a larger sample with a smaller one, the cognitive demands are too great for them to do this well (Gutheil & Gelman, 1997; Lopez, Gelman, Gutheil, & Smith, 1992).

#### 8.4.1.2 Diversity

As discussed earlier, adults are more likely to make inductive generalizations from different types of

converging evidence than from only one type. The results for children under age 10 have been more mixed, with some studies finding no evidence of diversity effects (Carey, 1985; Gutheil & Gelman, 1997; Lopez et al., 1992) and others finding that young children often over-generalize from diverse data (Carey, 1985; Lawson & Fisher, 2011). However, if the tasks have low cognitive demands and no hidden properties, young children seem capable of taking diversity into account. For example, when shown pictures of three very different types of dolls played with by Jane and three quite similar dolls played with by Danielle and then shown a picture of another kind of doll, 73% of participants ages 5 and 6 inferred that Jane rather than Danielle would want to play with the new type of doll (Heit & Hahn, 2001). However, it was also found that children were less likely to use diverse evidence when making inferences about remote categories or hidden properties of objects.

Interestingly, Margorie Rhodes and Peter Liebenson (2015) found that children ages 5-8 appropriately used diverse evidence more than non-diverse information when making inductions about novel categories but not when making them about familiar natural kinds (e.g., birds). In other words, category knowledge interfered with their diversity-based reasoning. In contrast, children ages 9 and 10 generalized more broadly from diverse samples than nondiverse ones when reasoning about both novel categories and natural kinds. These results indicate both developmental continuity and change in diversitybased inductions. At least by age 5, children have the cognitive mechanisms for incorporating different types of information into their generalizations, as shown by their use of diverse evidence when reasoning about novel categories. However, there is developmental change for the situations in which children access these mechanisms.

#### 8.4.1.3 Typicality

Several studies have found that young children are similar to adults in making inductive inferences based on premise typicality or how well an item represents a familiar category. For example, Gelman and Coley (1990) showed 2-year-old children a picture of a typical bird (e.g., robin), told them it was a bird, and asked them about one of its properties (e.g., "Does it live in a nest?"). The children were then shown atypical (e.g., dodo) and typical (e.g., bluebird) category members without the category name (e.g., bird) being repeated and asked if each one lives in a nest. The results were that children projected the property (e.g., living in a nest) to typical category members (e.g., bluebird) 76% of the time and to atypical members (e.g., dodo) only 42% of the time. Similar behavior was found for 3- and 4-year old children (Gelman and Markman, 1986). In addition, as with adults, premise-conclusion similarity also increased inductive inferences.

# 8.5 Development of Forms of Induction

As implied by the previous section, young children can usually perform category-based inductive reasoning, causal reasoning, and analogical reasoning if the tasks are simple and the children have the requisite knowledge about the properties, categories, and causal or functional relations that are used in the tasks (Goswami, 2011; Hayes, 2007). As Goswami notes about the development of analogical reasoning, "in the absence of the requisite knowledge, it is difficult to reason by induction" (p. 405).

Research indicates that by the time children are around age 5, they most likely use the same broad relations and cues that adults use for their inductive inferences (Hayes, 2007). The developmental changes that do occur are mostly quantitative and gradual, with some types of information, such as causal relations, being applied more frequently and across more domains. As they develop, children's knowledge base increases, their inhibition and memory retrieval processes become more efficient, and their relational working memory capacity improves (Perret, 2015). These cognitive changes allow children to perform more complex category-based inferences, causal inductions, and analogical reasoning.

In addition, some research indicates that children age 6 or older are more likely to have insights than those who are younger. For example, Tim German and Margaret Anne Defeyter (2000) gave children aged 5-7 an analogous task to Duncker's candle problem described earlier in this chapter. Their results showed that 6- and 7-year-olds in the experimental condition were significantly slower to think of the solution, which involved emptying and turning over a wooden box and using it as a support, than the same-age control group that received an empty box. Interestingly, the 5-year-olds in the experimental condition were significantly faster to think of the solution than their older cohorts. Furthermore, they were equally as fast as their same-age peers in the control condition. German and Defeyter conclude that around age 6, children develop a more narrow criterion for an object's function than they had earlier in life. Seeing the box used as a container placed that function in their initial representation of the problem. As with the adult participants in Duncker's experiment (1945), these children had to overcome functional fixedness and restructure their initial representation of the problem before they could insightfully solve it. In contrast, the 5 year-olds' fluid conception of the box's function required no restructuring or insight.

To conclude, both children and adults habitually use different forms of inductive reasoning to help make sense of their worlds and to predict future events. Throughout the human lifespan, this form of reasoning is influenced by similar attributes and constraints. These characteristics include number of observations, knowledge base, inhibitory processes, working memory capacity, memory retrieval processes, and the cognitive ability to detect relational similarity (Goswami, 2011; Perret, 2015). As individuals gain experience and expertise in multiple domains, their inductive reasoning becomes increasingly sophisticated for a wider-range of problems.

#### Summary


#### Review Questions


# Hot Topic

Janet E. Davidson

My research on insight began in 1982 when Robert Sternberg and I developed a three-process theory of insight. According to this theory, the cognitive processes of selective encoding, selective combination, and selective comparison are used to restructure one's mental representation of the givens, the relations among the givens, and the goals found in a problem in order to find a novel solution.

Selective encoding occurs when an individual suddenly finds one or more important elements in a problem situation that previously had been nonobvious. Selective encoding elicits insight by abruptly restructuring one's mental representation so that information that

was originally viewed as being irrelevant is now seen as relevant for problem solution and vice versa.

Selective combination occurs when an individual discovers a previously nonobvious framework for the relevant elements of a problem situation. In many problems, even when the relevant features are known, it is often difficult to know that these features should be combined and then to find a procedure to combine them appropriately. Selective comparison occurs when one suddenly discovers a nonobvious connection between new information and prior knowledge. Analogies, for example, can often be useful for solving new problems.

To be referred to as insightful, the relevant selections must not occur to people immediately upon presentation of a problem. After individuals reach an impasse, they must spontaneously search for and select previously overlooked relevant elements, methods for combining the elements, or connections between prior knowledge and the problem situation. Also, successful search for this relevant information must result in a seemingly abrupt change in the problem solver's mental representation of the problem.

In studies conducted with adults and gifted and non-gifted children as the participants, it was found that the three insight processes play an important role in the solution of non-routine problems and in individual differences in intelligent behavior. More specifically, individuals who solved the nonroutine problems correctly were more likely than those who solved them incorrectly to (a) have above average intelligence as measured by standardized tests, (b) apply spontaneously the three insight processes, (c) switch mental representations as a result of these processes, (d) experience a sudden and dramatic increase in feelings of confidence that they were nearing a solution, and (e) take longer than others to solve the problems. The last finding supports the view that successful insights can require additional time to restructure a mental representation for a problem and verify the solution. Correct performance on the nonroutine problems was also more highly correlated with scores on a standardized test of inductive reasoning than on scores for deductive reasoning. In addition, it was found that school-age children can be trained on the three processes to perform insightful problem solving; the training effects are transferable and durable. Future work will examine whether preschoolers at a science museum apply the three processes when they solve non-routine problems.

#### References


### References


175). New York, NY: Cambridge University Press. doi:10.1017/cbo9780511615771.006


pp. 75–79). Washington DC: American Psychological Association. doi:10.1037/10523-031


*rections in Psychological Science*, *19*, 402–405. doi:10.1177/0963721410388803


# Glossary


# Chapter 9

# Problem Solving

#### JOACHIM FUNKE

Heidelberg University

Problem solving is essential for humans to survive in a world that is full of surprises and challenges. Let us start with an example. Imagine the legendary situation on April 11, 1970, when the commander of the "Apollo 13" moon mission, James Lovell, told the people on the ground, »Houston, we've had a problem!« One of the oxygen tanks had exploded and brought the mission close to a catastrophe. Through a lot of creative measures (we would call them problem-solving activities), finally a safe re-entry to earth atmosphere was possible. A similar situation happened decades later, at the launch of the space shuttle "Discovery" on July 26, 2005. Film footage from more than 100 surveillance cameras showed that several tiles had fallen off the insulation of the outer tank of the rocket shortly after the launch. These tiles protect the space shuttle from overheating when re-entering the atmosphere. Fortunately, the damage could be fixed by repair carried out for the first time in space and thus the life-threatening situation could be averted (in our terms: the problem could be solved). Our other example does not have such a happy ending and shows just how existential problem solving can be: on February 1, 2003, similar damage to the rocket insulation had caused the "Columbia" to explode and the 7 crew members were killed while millions of people watched the deadly launch live on TV.

Of course, problems like these are far from commonplace. But life-threatening situations in space shuttles show what it means to have a problem in a

spectacular way: to be pursuing a goal (in this case, to complete the mission and return to earth alive again) and suddenly not know if and how this goal can be achieved because there is an obstacle or a barrier.

Problem solving is one of the highest forms of mental activity we know. The problem solutions resulting from this activity have contributed significantly to the success (and thus survival) of the human species, not only on the individual level, but also on a cultural level (e.g., in the form of speaking, writing, and numbering). To this day we know of no other creature besides humans on this planet who shape their lives in a comparable way through planned action and problem solving. However, this is no cause for unrestrained optimism in unlimited progress. These human capabilities also harbor the greatest destructive potential that has ever been observed in a species.

This chapter presents important concepts and results from the field of problem-solving research. The two parts of the term *problem solving* suggest starting with the problem part (differentiations that have to be made regarding the types of problems - not all problems exhibit the same characteristics) and then moving on to the solving part (which consists of different phases and has a temporal characteristics). Different theories will be described, together with an overview of methods as to how to analyze problem solving activities. Finally, the main aspects of this chapter will be summarized.

# 9.1 The Problem Part: What Constitutes a Problem?

Problems are normally embedded in certain domains. A domain can be as exotic as "space shuttle" or as normal as "playing cards" or "driving a car". In each domain, a given situation can be described as a state that can be changed by means of operators (tools). For example, the current state of my chessboard can be changed by using one of the possible regular moves (the operators) that brings me closer to my goal of a win (goal state). Sometimes there are barriers on the way from a starting point to the goal state. So, if a person wants to reach a certain goal state in a given domain and does not know how to reach it or how to overcome a barrier, this person has a problem.

The important parts of a problem can be identified as follows: the *actor* wants to reach a *goal* in a specific *domain*, there are different *states*, *changes* between states are possible with the help of *operators*, *barriers* on the way from a given to a goal state have to be *overcome*. For example, in case of the space shuttle mentioned earlier, the problem consists in tiles having fallen off, the goal is to come back safely to Earth, and operators were the activities that moved the given to the goal state.

There are different types of problems, depending on the clarity of the goal description and depending on the tools that can be used for changing the states of affair: In terms of the clarity of the goal description, a well-defined problem with clear goal descriptions (e.g., winning chess) is differentiated from an ill-defined problem that has no clear goal (e.g., the political situation in the Middle East: what would be the best political goal here?).

# 9.2 The Solving Part: What are the Steps to the Solution?

Traditionally, different phases of the course of action are differentiated into action-theoretical approaches (cf. Cranach & Tschan, 1997; Dörner & Wearing, 1995; von Wright, 1974; Werbik, 1978). Dewey (1910) already explained in his book *How we think* that people take a certain sequence of steps when

solving problems. It begins – according to Dewey – with a feeling of doubt (= the problem), continues with the identification of the problem, the search for relevant facts, and the formulation of first draft solutions. Then it comes to the examination of the solutions and, if necessary, to the reformulation of the problem, and finally ends in the selection and realization of the solution assumed to be correct.

According to Pretz, Naples, and Sternberg (2003, p. 3f.) problem solving runs through the following stages (they call it the "Problem-Solving Cycle"):


This is an idealized sequence of steps, and good problem solvers adapt this sequence to the situational requirements. For example, in some cases the representation step may require some effort whereas the step of allocating resources might be short. Pretz et al. call this sequence a "cycle" because the solving of one problem often generates new problems and, thus, requires the cycle to run again with the new problem.

The assumption of different phases of problem solving, described early by Bales and Strodtbeck (1951) and later by Witte (1972) as the "phase theorem" of problem solving, has both a descriptive and a prescriptive side: it is descriptive, as it is intended to describe the processes actually taking place in problem solving; it is prescriptive, insofar as this sequence also intends to serve as a rule for "good" problem solving. As Lipshitz and Bar-Ilan (1996) point out, this theorem in its manifold manifestations is indeed an important component of

the problem-solving literature, but the descriptive as well as prescriptive validity is not very well supported by empirical evidence, perhaps because these distinctions are logical rather than empirical. Thus, the various phases of the course of action, which will be discussed in more detail below, only have an ordering and thus meaningful function. A distinction is made here between the following five phases: a) goal formulation, b) hypothesis formation, c) planning and decision-making, d) monitoring, and e) evaluation:

a) *Goal elaboration*. At the beginning of an action there is a goal (motivational: a desired satisfaction of a need; cognitively: a target state to be reached) whose specificity can vary. The more unspecific the goal is (e.g., in the case of an ill-defined problem), the more effort must be put into working out the goal, to overcome dialectical barriers.

b) *Hypothesis formation*. Before acting, it is necessary to model the environment in which one acts. To this end, assumptions must be formulated about the relationships between the variables involved in order to exert an appropriate influence on this environment. Depending on the characteristics of the environment (e.g., computer simulations; see below), hypotheses can be formed and tested during the individual steps of an action.

c) *Planning and decision making*. Based on the hypotheses, intervention sequences need to be formulated that seem suitable for transferring the initial state into the goal state. This preparation of future decisions is called planning – an important component of actions, since it contains the preparations for a good (in the sense of target-oriented) course of action. In Funke and Glodowski (1990), this phase is referred to as the *creation* of a plan, which is intended to underline the constructive aspect. However, efficient planning is based as much as possible on experience (retrieval from long-term memory) and reusing "old" plans, thus minimizing the effort (in computer science this aspect is called "re-usability", see Krueger, 1992).

d) *Monitoring*. The phase of drawing up the plan is followed by a phase of plan monitoring, intended to ensure that the implementation of the plan does not in fact give rise to much disruption due to "frictions" (Clausewitz, 1832). Frictions occur as unfore-

seen (usually also unforeseeable) disruptions during the execution of the plan and require corrective interventions up to and including the termination of the plan.

e) *Evaluation*. The final phase consists of examining whether the result of the action corresponds to the objective(s) formulated at the beginning. Further action and problem solving might be necessary.

Fischer, Greiff, and Funke (2012) see the process of complex problem solving as a mixture of two phases, namely knowledge acquisition and knowledge application. These authors emphasize the importance of (1) information generation (due to the initial intransparency of the situation), (2) information reduction (due to the overcharging complexity of the problem's structure), (3) model building (due to the interconnectedness of the variables), (4) dynamic decision making (due to the eigendynamics of the system), and (5) evaluation (due to many, interfering and/or ill-defined goals).

In contrast to conceptions of more or less ordered processes, there is the assumption of "muddling through". Coming from the field of policy-making in public administration, Lindblom (1959, 1979) argues that decision-making in complex situations cannot follow a simple means-ends relationship. Instead, he proposes a kind of "incrementalism" (=muddling through), i.e. small changes towards certain goals following a series of trials, errors, and revised trials.

# 9.3 Problem Solving: What are the Theories?

In the short modern history of problem-solving research, there have been three major theoretical approaches to problem solving: Gestalt theory (including insight problem solving), action theory, and information-processing theory. The basic ideas, important terms, and the respective definition of a problem are given for all three approaches. A review of problem solving theories can be found in the recent paper by Fischer, Greiff, and Funke (2012).

# 9.3.1 Gestalt Theory

Problem-solving theories based on Gestalt principles were developed in analogy to concepts from the psychology of perception in Germany at the beginning of the 20th century (for a short history of Gestalt concepts, see Wertheimer, 2010). The basic idea at that time was that the field of perception does not consist of isolated elements but rather is organized in groups or shapes. In line with the principle of supersummativity, according to which the whole is more than the sum of its parts, it is also postulated in the case of thinking tasks that organized forms emerge from different parts which determine the solution. For example, look at the well-known ninedot problem, in which nine dots distributed evenly in a square have to be connected by drawing four lines without the problem solver setting down the pen. The form of the dots creates a shape, which in this case is an obstacle to the solution: the square form suggests erroneously that the lines should be drawn within the four corners of the square – in fact, however, one must go *beyond* this boundary in order to find a solution (see Figure 9.1).

Important terms from Gestalt psychologists for today's psychology of thought are: insight and ahaexperience, restructuring, functional fixedness, and Einstellung. *Insight* and *aha-experience* describe psychological qualities based on experience that occur in the solution phase of a problem and denote the understanding of an initially incomprehensible, problematic fact (e.g., understanding of a magician's trick). *Restructuring* means changes in the attentional structure (e.g., interpreting the background as foreground). *Functional fixedness* occurs when objects of daily use are first to be used in their natural function, but later on in a new, unusual one (e.g., a matchbox with matches to light a cigarette but that could be used later as a candleholder). The *Einstellung effect* occurs when a certain solution pattern becomes routine for similar problems and is executed even if there are simpler solution paths (also called *set-effect*; e.g., using a complicated solution sequence in filling water jars even when more simple sequences exist, Luchins & Luchins, 1950).

*Definition of a problem*: According to Gestalt theories, a problem is characterized by a bad gestalt that

could be transformed into a good gestalt by restructuring as a result of insight, according to Gestalttheoretical assumptions. The problem-solving process thus presupposes the recognition of the bad and the good gestalt as well as the existence of insight.

# 9.3.2 Action Theories

Action theories differentiate between several stages of action: action planning, action execution, and action evaluation. They do not isolate specific psychic sub-functions but rather determine their contribution to the more comprehensive form of an action and its context. In addition, action theories address intentions that give meaning to certain behaviors (for the distinction between behavior and action, see Graumann, 1980). For example, if you see somebody on a cold winter day in a summer dress, this strange behavior can become understandable if the person explains her intention to train her immune system. Strange behavior, thus, becomes intentional action.

Action theories have an integrative function and can help to compensate for the fragmentation of psychology into separate parts by providing a general frame of reference. It is interesting from a historical point of view that at the time John B. Watson formulated his radical "manifesto of behaviorism" in the USA and recommended to psychology the restriction of theory and research to intersubjectively undisputed "pure" behavior (Watson, 1913), the Heidelberg sociologist Max Weber built a "sociology of understanding" on the basic concept of action (Weber, 1913).

*Definition of a problem*: According to action theories, a problem is characterized as part of a goaldriven, intended action that reaches a dead end and requires active regulation processes to overcome the barrier or to find another course of action that leads to the goal state.

## 9.3.3 Information-Processing Theories

Theories of information processing are inspired by the idea of conceiving human cognition as symbol manipulation. Starting from the cognitive turn in the 1950s (for a more detailed description of this revolution see Gardner, 1985) and against the background

Figure 9.1: The Nine-Dot problem: Nine points distributed evenly in a square (left side) are to be connected by four lines without setting down the pen.

of the information theory presented by Shannon and Weaver (1949), all kinds of mental activity – perception, learning, thinking etc. – were summarized under the term *information processing*. Information became the raw material that the organism absorbs, stores, and processes.

The underlying idea of interpreting information processing of the organism as symbol manipulation makes it possible to reproduce such processes on a computer ("cognitive modeling"); the division into data (symbols representing certain states) and program (symbols representing certain transformations of symbols) is unimportant considering the fact that symbols are involved in both. Important for the symbolic system of human language is its tool function for thinking. The "inner conversation of the soul with itself" (=thinking), as the Greek philosopher Plato formulated it over 2000 years ago, is nothing

other than information processing (see also Chapter 11, "Nature of Language").

#### 9.3.3.1 Problem Space and Task Environment

When a motivated person deals with an intellectual requirement, an analysis of behavior provides information about both the task and the thought processes. Both aspects are inextricably linked, but should nevertheless be kept apart conceptually. For a better understanding, Newell and Simon (1972) therefore introduced the term task environment to describe the symbolic content that is necessary to solve a problem. This externally given information corresponds to the internally constructed problem space, which describes the subjective representation of a task, i.e. the imaginary space in which problem solving takes place during thinking. Their influential theory of problem solving is described in more detail in Textbox 9.1.

#### Textbox 9.1: Theory of Problem Solving by Newell and Simon

In their book "Human Problem Solving", Newell and Simon (1972) presented a theory of problem solving that has been widely and sustainably received and still represents the basis of many approaches in this field today. Two cooperating sub-processes form the core of their theory: the process of *understanding* and the process of *searching*.

*The process of understanding*. The understanding process has the function to generate the internal representation of the problem. The problem situation must be perceived in order to deduce from the information given initially (a) what the initial state is, (b) which operators can be used to change the state, and (c) how to recognize that an achieved state represents the goal. These three components make up the problem space, which is constituted by the process of understanding (see below for

more). Of course, the problem space can change during the solution process when new information becomes known, whether due to external circumstances or due to search processes.

*The search process*. The search process has the function of generating the solution to the problem. This process is driven by the result of the understanding process. It searches for differences between a given state and a target state and for operators that could bring about a state change. Different search procedures for low-knowledge tasks have been called "weak methods". They are weak because their generality is at the expense of their power. Specific methods ("Use the hammer to drive in the nail!") are stronger, but cannot be used often (it does not help to fasten a screw). More general methods ("Find a tool to get ahead!") are more common, but weaker (which tool to use remains open).

One might think that the two processes of understanding and searching described by Newell and Simon would be executed in a fixed order (first understanding, then searching). In fact, however, problem solvers often switch back and forth between the two processes and mix them (see Chi, Glaser, & Rees, 1982; Hayes & Simon, 1976).

With their ideas, Newell and Simon (1972) pointed to an important issue for problem-solving research. They distinguish between psychological processes on the part of the problem-solving person on one hand and perfect rationality on the other hand – a distinction that results from the limited rationality (Simon, 1947) of human behavior. By the way: Herbert Simon was awarded the Nobel Prize for Economics in 1978 for these considerations and the associated criticism of the theory of the all-time rational *homo oeconomicus*.

The idea of a problem space has inspired Simon and Lea's (1974) "dual space model", which divides the problem space into a *rule space* and an *instance space*. In the rule space, all possible rules of a task are represented, in the instance space all possible states. Using the example of chess, the rules represent the legal moves of each figure (the operators). The instances are all possible arrangements that the figures can take.

Using the example of cryptarithmetic problems (see below, Section 9.5.1.2, "Cryptarithmetic Problems"), where letters stand for numbers, the instance space consists of the individual column elements of the letter addition, whereas the rule space contains the rules as to how letters can be replaced by numbers. Problem solving in this case means finding out those letter-number substitutions where the resulting arithmetic operations are correct. If, for example, the task is to assign numbers to letters so that the following addition becomes a correct one

and the problem solver also knows that D=5, a replacement process can be carried out that now rewrites the instance space as

By applying mathematical rules, the last position of the result has to be T=0 and thus the rule space is extended. What can be done to find the complete solution?

With the method of (a) "generate-and-test", one can simply try out arbitrary assignments of numbers to letters. More intelligent would be method of (b) knowledge-guided "heuristic search", which does not produce arbitrary new states in the instance space but only those which fulfill certain preconditions; e.g., R must be an odd number because of the necessary carry of the second to last column and the fact that the addition of two same numbers (L+L) always produces an even-numbered result. An alternative description of this process would be the method of (c) "rule induction", which is used to check whether

Figure 9.2: (a) Programmable truck BigTrak. (b) Keypad for programming. The keypad shown differs from the one used in the experiment by having a X2 key instead of a RPT key (both figures from WikiMedia Commons, licensed under the terms of the CC-BY-SA-2.0).

a certain assumption such as R=7 is not only correct in a concrete case but is also consistent with all other available data.

Simon and Lea (1974) emphasize that their approach is useful not only for cryptarithmetic problems but also for the description of concept acquisition, sequence learning, or the recognition of grammars. The "General Problem Solver" (GPS) is accompanied by a "General Rule Inducer" (GRI) which supports exactly these processes concerning the generation and testing of possible solutions.

Klahr and Dunbar (1988) further extended the dual space model. They have developed their SDDS model ("Scientific Discovery as Dual Search") to explain scientific discoveries. In this model, they differentiate between the *experiment space* (which is similar to the instance space), and the *hypothesis space* (similar to the rule space). In the hypothesis space, hypotheses are generated, modified and rejected, e.g. via connections between input and output variables. In the experimental space, on the other hand, experiments of the type in which the hypotheses generated can be tested or how the operators are to be applied are planned. For this purpose, both problem spaces (as in Simon & Lea, 1974) must interact: activities in the hypothesis space activate operations in the experiment space. There is also the

opposite direction of influence: If no hypothesis is made about observations on the object of investigation (search in the hypothesis space), it is possible to use operators (search in the experiment space). Hypotheses can then be derived by observing the results of these experiments.

For an illustration of their approach, they choose a programmable toy truck "BigTrak" (see Figure 9.2), whose moving behavior can be predetermined by certain keys (e.g., two steps forward, honking, two steps to the right). The keys on the car are divided into 11 instruction keys (e.g. GO, CLS, HOLD) and 10 number keys (0-9). The subject's task is to find out the meaning of the unexplained RPT key (solution: RPTx repeats the last x instructions). The search for the meaning of this function key leads to the formation of hypotheses and the execution of experiments (see Shrager & Klahr, 1986).

A total of 20 participants in this experiment learned to program BigTrak within 20 minutes. They had to think aloud while working on the problem. Then they had to explore the RPT key, which had not been used before and had not been explained either. Of the many results of this investigation, only one is described here, which refers to a typology of the participants. According to the authors, 7 participants can be called "theorists", the remaining

13 participants were labelled as "experimentalists". On average, theorists needed 24.5 minutes to solve the problem and performed 18.3 experiments (12.3 with specific hypotheses), whereas the experimentalists needed only 11.4 minutes to solve the problem and performed 9.3 experiments (8.6 with specific hypotheses). While the theorists searched in the hypothesis space, the experimentalists concentrated on the experiment space and attempted to derive generalizations from their experiments.

With the dual space model, the results can be explained in terms of strategies, semantic embedding (cover story), goal specificity, hypothesis testing, and knowledge acquisition. The model also points to the issue that many studies with interactive tasks like BigTrak did not distinguish between an exploration phase and an application phase (an unknown system is explored in the exploration phase; in the application phase, explicitly specified goals have to be reached), i.e. the test persons knew the target values or the goal state of their system (specific target) from the outset. Thus, the task could also be solved in such a way that persons with a means-end analysis try to reach the goal (search in the instance space) without formulating hypotheses. They do not acquire knowledge about the system, but learn how to reach the goal (implicit knowledge; see Berry & Broadbent, 1988). For example, Geddes and Stevenson (1997) have explained the dissociation of knowledge and goal attainment. If, on the other hand, explicit knowledge is acquired, hypothesis generation and testing are present (search in the rule space). The search within the rule space can be demanded by the fact that a systematic strategy should be used and no target values are given. A semantic embedding of a problem (instead of a mere abstract description) as well as the specification of a hypothesis have the consequence that more hypotheses are tested and thus the search in the rule space is also required.

With the help of the dual space model, the results of the BigTrak experiment and of similar interactive tasks can be interpreted easily, and it becomes apparent why something was learned in some tasks and not in others. Nevertheless, there are findings that make an extension of the model necessary. One such finding is, for example, that sometimes a specific goal leads to better performance if the subjects have

Funke Problem Solving

an *incomplete* model of the task (Burns & Vollmeyer, 1996). Even the specification of false hypotheses (Vollmeyer, Burns, & Holyoak, 1996) leads to improved performance in complex problems, which can be interpreted indirectly as an indication of an intensified search in the hypothesis space (see also Burns & Vollmeyer, 2002).

*Definition of a problem*: According to information processing theories, a problem is defined as a barrier between a given and a goal state, requiring input from a bridging operator, which cannot be taken from the library of already known operators but has to be constructed on the fly. Problem solving is seen as a search for a solution within the problem space.

# 9.4 Methods for Assessing and Measuring Problem Solving

Because problem solving occurs in the head of a person, it is not easy to assess the process of problem solving itself. Different proposals have been made to solve this problem (see also Chapter 3, "Methods"). On the one side, there is access via self-report (e.g., introspection and think-aloud; see below), on the other side, access via behavioral data (e.g., behavior traces and log-files; see below). Last but not least, physiological data (e.g., eye movements and brain-imaging techniques) have been proposed.

# 9.4.1 Self-Reports

Introspection is the observation of one's own mental process. It was used in the 19th century by "armchair" psychologists who would rely on their own inner experience instead of empirical obeservations. Introspection is deemed unsuitable in modern research because there is no possibility to prove accuracy of the given report.

Thinking aloud is the continuous verbalization of thought processes during problem solving and can be used as a valid data source under certain conditions (Ericsson, 2003). The spontaneous utterances accompanying the act of thinking represent objective expressive behavior that is used for assessment (Jäkel & Schreiber, 2013).

Ericsson and Simon (1983) regard thinking aloud methods as unproblematic if the actual thought content is only verbalized and described, because this thinking aloud only slows down the thinking process but does not disturb it. Explaining or describing one's thoughts carefully, however, disturbs the process of thinking and changes the procedure of the participant (see Ericsson, 2006). Güss (2018) recommends this method especially for testing theories cross-culturally.

Verbal data is valid even if there is no 100% agreement between thoughts and verbalizations. Reasons for this deviation are (a) that not all conscious thoughts are verbalized by a participant and (b) that other cognitive steps run unconsciously due to routine/expertise and therefore *cannot* be verbalized at all. Additional data sources such as reaction times, error rates, eye movement patterns, or recordings of brain activity can increase validity. It is not the thinking itself that manifests itself as behavior but rather the consequences that accompany it.

### 9.4.2 Behavioral Data

Three behavioral measures will be discussed briefly: sequential problems, computer-simulated problems, and log-file analyses.

By using *sequential problems*, one tries to visualize the solution path between the initial and the target situation (and thus the process of the solution) as a series of intermediate states. A good example of a sequential problem is the Tower of Hanoi (see below). Sequential problems "materialize" the solution process by producing a trace through the problem space.

*Computer-simulated scenarios* allow the investigation of the effects of connectedness and dynamics in complex situations by creating realistic simulation environments. Connectedness (i.e., the relationships between variables in a system) forces us to create causal models. The dynamics of a system force us to anticipate the course of development over time and to act with foresight. The interaction of human participants with such scenarios shows their strategic approaches and their reaction to certain scenarios. One can measure how well the connectedness be-

tween the system variables is understood and how well they deal with the dynamics of the system.

*Log-file analyses* look at the step-by-step activities during interactions with computer-presented problem-solving tasks. Such tasks have been used for the first time in a world-wide assessment of student performance in problem solving within PISA 2012, the "Programme for International Student Assessment" run by the OECD from the year 2012. Zoanetti and Griffin (2017) showed the advantages of going deeper into the specific solution steps that are documented in the log-files instead of looking only at the results of certain tasks. For example, pupils who repeatedly interacted erraneously with the software and who ignored negative feedback could be easily identified. Solution strategies became visible.

#### 9.4.3 Physiological Measures

*Eye-movement patterns* can be used to derive the processes underlying thinking. Eye movements consist of saccades (fast, short movements of the eyeball to align the fovea with the visual objectives) and fixations (keeping the visual gaze on a single location). It is assumed that a large part of information processing takes place during the fixations.

Eye-movement measurements are used in addition to reaction-time and decision-time measurements in specific fields of experimental psychology, such as perception psychology. Pupillometric data allow conclusions to be drawn about workingmemory load, concentration, and emotional and motivational components. Beatty (1982) describes several experimental and correlational studies that warrant such statements.

Also, *brain-imaging methods* can be used to depict physiological changes during thinking. Imaging methods such as functional magnetic resonance imaging (fMRI) are of particular importance for the investigation of problem solving. The aim of such a method is to measure haemodynamic changes of the brain (i.e., changes in the blood flow within the brain due to cerebral activity) as a marker for neuronal activation within certain brain structures.

The fMRI is a spatially high-resolution method, meaning that it allows for a very precise allocation

of regions in the brain. It is based on the fact that an increase in neuronal activation leads to an increase in oxygen demand, which in turn leads to an increased supply of oxygen-rich blood. This increase in oxygen can be made visible by means of a magnetic field. Changes in neuronal activity thus become accessible. The application of neuroimaging techniques to research questions in the field of problem solving is still rare (Anderson et al., 2008)

# 9.5 Paradigms and Illustrating Experiments

For illustrative purposes, the following section presents some of the frequently used tasks in problem solving research. I will start with examples for simple tasks, then round off with complex ones.

# 9.5.1 Simple Tasks

Simple task requirements differ from complex ones in the low amount of instruction and knowledge required to process them. With regard to the amount of knowledge required for understanding the problem situation, one could also speak of semantically impoverished problems as opposed to semantically rich problems. In addition, simple tasks usually have short processing times of up to 10 minutes, whereas complex tasks require hours or days. The

simple tasks include (a) classic mental exercises (such as insight problems), (b) cryptarithmetic problems (where letters represent numbers), and (c) sequential problems like moving disks.

#### 9.5.1.1 Insight Problem Solving

In the early days of problem-solving research, brain teasers and insight problems were the preferred research material. Classic insight problems were presented, for example, by Duncker (1935) as part of his book *Psychology of Productive Thinking*. He examined the problem-solving process more closely, especially with regard to two problems:


Figure 9.3: Duncker's Radiation Problem: A patient needs a radiation treatment on a tumor inside the body. Normal radiation will harm the healthy tissue it reaches on the way in. The solution is to target the tumor with low-level rays coming from different directions that have to converge on the tumor (from http://www.jimdavies.org/research/visual-analogy/proposal/node1.html).

Figure 9.4: Two examples of matchstick arithmetics: (a) 4 = 3 + 3 (solution: 6 = 3 + 3); (b) 3 = 3 + 3 (solution: 3 = 3 = 3; from Knoblich et al., 1999).

Duncker's survey method was not selfobservation (introspection), as practiced, for example, by representatives of the historical Würzburg School (Oswald Külpe, Karl Marbe, Otto Selz) but observing somebody "thinking aloud", a method in which the thinker remains directed at the content of his or her thinking. His analysis of the proposed solutions to the radiation problem shows that the various ideas can be arranged according to their "functional value". Duncker calls this list "solution tree".

Insight problems using "match-stick arithmetic" were investigated by Knoblich and coworkers (Knoblich, Ohlsson, & Raney, 2001). An insight problem occurs when an obstacle appears after the first exploration ("impasse", dead end) and the solution appears subjectively impossible (see Metcalfe, 1986). One can get out of these mental dead ends only by changing the representation of the problem. Two examples from the work of Knoblich et al. (1999) will be presented in more detail (see Figure 9.4).

Problems in the field of match-stick arithmetic consist of false arithmetic expressions, which are composed of Roman numbers (I, II, III etc.), arithmetic operations (+, -) and the equal sign (=). By picking up one of the matches, the wrong one has to be turned into a correct expression. In Figure 9.4a, for example, the IV can be turned into a VI. This is the typical representation in which the numerical values are regarded as variable and the arithmetical operations as constant. If one loosens this boundary condition and allows that also the operators may be seen as variable, the task in Figure 9.4b can be

solved by making a "=" out of the "+". Besides the loosening of boundary conditions, the problem representation can also be changed by the decomposition of chunks (= single elements combined to groups). Thus, "weak" chunks like "IV" are distinguished from "strong" chunks like "X", whose decomposition into "/" and "\" is more difficult due to the lack of significance of the individual parts.

Based on these two postulated mechanisms for changing the problem representation, Knoblich et al. could make specific predictions about different task difficulties and differential transfer effects for matchstick problems, which were confirmed in the reported experiments. Accompanying eyemovement analyses (Knoblich et al., 2001) also confirmed the following theoretical assumptions: (a) at the "dead end" states, there are fewer eye movements and longer fixation times; (b) as a result of prior arithmetic knowledge, one tends to regard the numerical values and not the operators as the variable quantities.

Matchstick arithmetic is an interesting problem type that can be used to investigate elementary thought processes of insight problems. In connection with eye-movement analyses, this simple paradigm allows process theories to be tested that would otherwise hardly be accessible to empirical research. However, it should also be noted that the small amount of knowledge that these problems require to be solved represents an advantage in terms of empirical and systematic analyses. At the same time, simple problems do not represent the complexity of problem-solving processes in everyday situations, let alone in space shuttle catastrophes, since

much more world knowledge usually is needed in real-life problem solving.

*Anagram tasks*. Another approach to gaining insight into the underlying processes of problem solving comes from the analysis of solution processes for anagram tasks. Anagrams represent letter sequences that must be changed around to form a word (e.g., HOOLSC -> SCHOOL). In this case, the difficulty can be influenced by the number of letters that have to be changed, the total number of letters given, and word frequency.

Metcalfe and Wiebe (1987) have shown that anagram solutions rely on sudden insight processes and not on a general, sequential approximation to the answer. They showed that by capturing "hot-cold judgments" (an indication collected every 10 to 15 seconds of how close a problem solver feels to the solution) one cann accesss the process of gaining insight. While these judgments gradually increased as equations were solved, they remained consistently low for anagrams and only rose steeply shortly before the solution was found (see Chapter 6, "Metacognition", for further research with anagram tasks).

#### 9.5.1.2 Cryptarithmetic Problems

Cryptarithmetic problems require the decoding of letters into numbers using arithmetic procedures. Figure 9.5 illustrates an example of such puzzles.

Cryptarithmetic problems are not used so often nowadays because of their simplicity and uniformity of required processes: it is a relatively simple constraint satisfaction task. The total number of possible states is reduced by the constraint of a unique digit for a unique letter in a decimal representation. To make the task easier, more lettters could be disclosed at the outset.—The last prominent publication with that type of problem dates back more than 25 years (Clearwater, Huberman, & Hogg, 1991).

#### 9.5.1.3 Sequential Problems

Sequential problems are those that require a series of steps to solve them, steps that are reflected in externally visible changes in the state space. Let us start with the "Cannibals and Missionaries" problem (also known as "Orcs and Hobbits"; more generic denomination: river-crossing problems, "move" or "transformation" problems). In this task, representatives of each group—cannibals and missionaries have to be transported from one side of a river to another. A boat offers space only for a limited number of people. The major rule for solving the problem is that on neither of the banks nor on the boat can the number of cannibals exceed the number of missionaries because otherwise cannibals would do what their name suggests. To avoid such a catastrophe, a careful maneuver is demanded. According to the model developed by Jeffries, Polson, Razran and Atwood (1977), subjects working on this task consider only single-step move sequences. These moves are selected according to two simple rules: (a) search for better states (in terms of less distance to the goal state), (b) avoid states that have been previously visited.

Another prominent example of a sequential problem is called the "Tower of Hanoi" and will be presented here in more detail because it is widely used. The problem consists essentially in moving a given set of differently sized, concentric disks, which are arranged on a starting rod, to a target rod using an auxiliary rod. Two rules have to be followed: (1) Only one disc may be moved at a time, (2) never place a larger disc on top of a smaller disc. Figure 9.6 illustrates the problem by showing the entire instance space, that is, all possible positions for the (simple) case of three discs on the three rods.

The instance space shown in Figure 9.6 explains the attractiveness of the problem for thought re-

Figure 9.5: Example of a cryptarithmetic problem: each letter corresponds to one of the figures 0 to 9 (hint: E=5, Y=2). The numbers in each line should produce a correct addition.

search: Every single move of the problem solver can be represented as a step through this instance space. At the same time, each intermediate state during the solution process can be evaluated in terms of how far away it is from the required target state. In addition, it is possible to show which path is the fastest to the goal for any intermediate state. The process of problem solving can be described as a trajectory (a temporal sequence of states) in this space (for an in-depth analysis of the Tower of Hanoi, see Kotovsky, Hayes, & Simon, 1985). For the problem solver, this type of problem is easy to recognize, to define, and to represent. That is much more difficult in the case of complex tasks.

### 9.5.2 Complex Problems

A complex problem shows the following features: (1) *complexity* in the sense that many variables are involved, (2) *connectivity*, reflecting the fact that relations exist between variables, (3) *intransparency*, referring to missing or inaccessible information im-

portant for the problem-solving process, (4) *dynamics*, in the sense of the possible change of a given situation over time, and (5) *polytely* (from the Greek word 'polytelos', meaning many goals), in the sense of there being many goals and objectives involved that are possible and could be pursued. All five features will be explained in briefly.

*Complexity*. Complexity in the sense of the number of variables involved plays an important role insofar as human information processing only has a limited capacity. As a consequence, the problem solver must take measures to reduce complexity, such as simplifications. He must also be able to deal with the fact that the simplified models can be inaccurate and even wrong in individual cases. For example, to model the complex relationships between world population, energy demand, and resource use, Meadows and colleagues (Meadows, Meadows, Randers, & Behrens, 1972) created a world model on a computer that has reduced the complexity of this huge problem to around 100 variables. Even if a large part of the detailed calculations of this model

Figure 9.6: The instance space for a Tower of Hanoi with three disks. On top, all three disks are on the left rod (=start); at the bottom right all three disks are on the right peg (=goal). The shortest path between start and goal is to follow the edge from top to right within seven steps.

are inaccurate from today's point of view, the consequences and warnings derived from it were correct.

*Connectivity*. With increasing intervariable dependency and connectivity, the effects of interventions in such a network are difficult to predict. As a consequence, the problem solver must map the dependencies into a model that forms the basis of his or her decisions. An example: Interventions in an ecosystem can have side effects that were not expected. One could think of bees dying because of intensified use of pesticides.

*Intransparency*. Intransparency is the lack of information about the problem situation; it makes a complex problem a decision-making situation under uncertainty. As a consequence, the problem solver must collect information that is missing. The problem solver needs to accept that her decisions may not include all relevant facts. For example, in a hospital emergency admission, not all desirable and necessary information about a seriously injured accident victim is available to the physician. Nevertheless, action must be taken and with minimal initial information a situation picture must be produced, which always is supplemented later by further facts, piece by piece.

*Dynamics*. Dynamics of a system refer to the changes of a given state over time. As a consequence, the problem solver must consider possible changes of the given situation and make prognoses about future developments. Potentially resulting time pressure has to be endured. For example, anyone speculating on the stock market usually makes assumptions about future market developments, but occasionally has to realize that the dynamics of the market cannot always be accurately predicted. Another example: In the event of a forest fire, a sudden change in wind direction can considerably disrupt the planning of the fire brigade and even endanger its activities.

*Polytely*. Polytely concerns the number and type of goals involved that need to be considered. As a consequence, the problem solver must set priorities and thus solve value conflicts. For example, company leaders usually strive for the highest possible profit. One major factor influencing this goal is the salary of the employers: paying employees high salaries should lead to more job satisfaction and productivity (good for the profit), but at the same time such salaries are costly (bad for the profit). Therefore, an optimal balance for this factor needs to be found, which can be very difficult.

With these descriptions for complex problems in mind, let us look at two of the most prominent examples for this type of task, namely, the political scenario "Lohhausen" and the business scenario "Tailorshop".

#### 9.5.2.1 Lohhausen

The political scenario "Lohhausen", with around 2,000 variables, is one of the most complex scenarios in terms of the number of variables. "Lohhausen" is a small computer-simulated town. In the study with this scenario, 48 student participants were acting as a mayor for a simulation period of 10 years and were to lead the community as effectively as possible (Dörner, Kreuzig, Reither, & Stäudel, 1983). According to the description given by Dörner (1981, p. 165), the small town has about 3,500 inhabitants and its main income comes from a clock factory belonging to the town. In addition to the town administration, there are medical practices, retail shops, a bank, schools, kindergartens, etc. In the simulation, not only economic relations were mapped but also social, demographic, and psychological variables (e.g., satisfaction of the inhabitants). Participants were able to interact with the system in a variety of ways: They could influence the production and sales policy of the municipal factory, they could change tax rates, create work plans for teachers, set up and lease doctor's surgeries, build housing, provide recreational facilities, etc.

Data analysis was essentially based on the comparison of the 12 best with the 12 worst acting participants with regard to important measures of success such as population of the town, number of unemployed people, condition of the local watch factory, immigration rate, satisfaction of the inhabitants, or capital of the municipality as well as judgments of the experimenter about the test-taker (e.g., "participant makes an intelligent impression"; subjects did not know these criteria before they started with the simulation).

One of the most important (and surprising) results of this study: intelligence (measured with a conventional intelligence test) was not a predictor of performance in the scenario! This finding questioned the classical measurement of intelligence as one that is only assesses analytical intelligence but neglects "operative intelligence" (Dörner, 1986), which had not yet been measured by conventional IQ tests. This apparent shortcoming of intelligence tests has subsequently led to a sharp controversy about the benefits of IQ tests. As a result of this debate, the value of the intelligence component "information processing capability" now appears undisputed (see Wüstenberg, Greiff, & Funke, 2012; Kretzschmar, Neubert, Wüstenberg, & Greiff, 2016; for a meta-analysis: Stadler, Becker, Gödker, Leutner, & Greiff, 2015).

With regard to the successful control of the Lohhausen community, none of the expected predictors like motivation, test creativity, gender, age, subject of study, or previous education of the participants was important. Successful "mayors" were characterized by strengths in other fields: self-confidence, extraversion, the striving for meaningful information search ("controlled divergent exploration") or switching between fluctuating and focused thinking proved to be advantageous.

Three primary errors in handling the complex system, which occurred with most participants, were highlighted: (1) the lack of consideration of temporal sequences and difficulties in predicting exponential processes; (2) thinking in causal chains instead of causal networks; (3) the superiority of the current motive.

*Difficulties in predicting exponential processes* occur because of a natural tendency to linearize our predictions. Exponential growth can be visualized by the idea of doubling the grain of rice on a chessboard square by square, starting slowly with one grain on the first square, two grains on the second square, 4 on the third, over 1 million by the 21st square, over a trillion by the 41st square and ending up with a number starting with 1.8 and 19 zeros following by the last 64th square.

*Thinking in causal chains instead of causal networks* is demonstrated by the human tendency to search for simple cause-effect connections (e.g., "migrants increase the expenses of social security systems") instead of a broader view that sees, for example, also advantages of migrants (increased diversity, increased work force, etc.). Political reasoning is sometimes driven by such causal-chain simplifications.

*Superiority of the current motive* means that humans are driven by their current motives and do not look much into the future. The problems of sustainability fall into this category: We do not want to forgo today's luxury in order to keep our planet in a good shape for the next generation. Such long-term problems suffer from this error tendency.

#### 9.5.2.2 Tailorshop

The business scenario "Tailorshop" presents a profitbased enterprise in which fabrics are made into shirts by workers using production machines. The shirts are then sold on the market. The system consists of a total of 24 variables, 11 of which can be directly influenced by the respondents' actions (for a more detailed description, see Danner et al., 2011, or Funke, 2010). The system's core variable is the "capital" (balance sheet value), which is connected to 15 of the 24 variables. The task of the problem solver consists in managing the "Tailorshop" over a correspondingly extended simulation period in such a way that a sustainable profit is generated. Without intervention in the system, the "Tailorshop" would soon have to file for bankruptcy, as the running costs (storage costs, wage costs, rent, etc.) quickly lead to negative figures. This can be avoided by purchasing raw materials, maintaining the machines, and paying the workers a reasonable wage. In addition, the shirt price must be made competitive. Figure 9.7 shows the variables of the Tailorshop and their connections.

# 9.5.3 Comparing European and American Approaches to Complex Problems

According to Sternberg (1995), a special feature of European research in dealing with complex problems compared with American research is that in European research (as in other studies of European origin), novices are used as participants who had

Figure 9.7: Diagrammatic representation of the variables from the "Tailorshop" simulation (sorted by categories; from Engelhart, 2014, p. 30).

to take on leadership tasks with their everyday routines and without any training or preparation. In the American tradition, research concentrates more on experts in their respective fields. So, the two different approaches can be seen as complementary ways of researching into the psychology of human thought.

## 9.6 Conclusions

Problem solving can be seen as one the key competencies in the 21st century (Care, Griffin, & Wilson, 2018; Fiore et al., 2018). The argument here is that the labor market is changing more rapidly than ever. The grandfather who trained to be a shoemaker could do this for the rest of his life. Today's workforce has to learn and to re-learn new tools dayby-day. This is why problem solving is becoming more and more important, not only in the workplace. But it may be that problem solving is part of an even more complex competency, namely *systems competency* (Funke, Fischer, & Holt, 2018), the ability to handle complex systems. To control such systems and to keep them stable requires more than problem solving. And because systems competency needs information and reliable knowledge, critical thinking

(Halpern, 2013) becomes important in times of fake news and indoctrination.

Are there any open questions? First, there is still no comprehensive theory of problem solving that applies to the different types of problem. Second, the best way for assessing problem solving remains unclear. The validity of different measurement proposals is under scrutiny (Dörner & Funke, 2017). Third, besides individual problem solving, the focus will be on collaborative problem solving (i.e., two or more persons working together on a problem; see, e.g., Care & Griffin, 2017) because our modern times require people to work together. It has yet to be shown what the best mixture of collaborative and individual problem solving would be.

### Acknowledgement

The author wants to thank Elena Graf, Dr. Daniel Holt, Laura Krieglstein, and Dr. Katrin Schulze for their insightful comments on a draft version of this chapter. Also, thanks to my co-editor Bob Sternberg for his comments on my chapter. The work underlying the present article was supported by a grant provided by the Deutsche Forschungsgemeinschaft to the author (Fu 173/13 and Fu 173/14).

#### Summary


#### Review Questions


#### Hot Topic

Joachim Funke

In my own research, I have tried to develop new instruments for measuring problem-solving competencies. Inspired by research about complex problems done by Dietrich Dörner in the mid-1970s, I started with an adaptation of his simulation scenario Tailorshop, then decided to develop more formal-based scenarios (MicroDYN, MicroFIN). I will present both instruments shortly.

Tailorshop is a microworld where subjects have to manage a small business simulation for a simulated time period of, e.g., 12 months. They can buy machines, raw material, set the wages for their employees, hire and fire workers, care for maintenance and for attractive sales conditions. In this situation, subjects have to deal with complexity, intransparency, dynamics, and conflicting goals—most of these features are characteristic for complex problems.

The development of MicroDYN and MicroFIN was driven by the requirement to construct "batteries" of test items for the purpose of psychometric assessment: what was needed were easy, medium, and difficult items that could be compared directly. Based on formal systems, such batteries were constructed for the world-wide PISA 2012 assessment of problem solving (see Csapó & Funke, 2017).

In the end, questions of validity remain most important: if we want to contribute to an understanding of problem solving "in the wild", we have to explain how managers, politicians, and other leaders make decisions and to predict errors as well as "wise" decisions in the long run (see Dörner & Funke, 2017).

What we need in the 21st century more than ever is *systems competency* (which is more than problem solving; see Funke, Fischer, & Holt, 2018). To understand how people represent complex systems, how they predict the future states of such systems, and how difficult it might be to make goal-directed interventions without producing unwanted side-effects: these are goals for my future research.

#### References

Csapó, B. & Funke, J. (Eds.) (2017). *The nature of problem solving. Using research to inspire 21st century learning*. Paris: OECD Publishing. doi:10.1787/9789264273955-en


# References


*ing: The European perspective* (pp. 65–99). Hillsdale, NJ: Lawrence Erlbaum Associates.


# Glossary


# Chapter 10

# Decision Making

#### JULIA NOLTE, DAVID GARAVITO & VALERIE REYNA

Cornell University

Choice is ubiquitous, from small decisions such as whether to bring an umbrella to life-changing choices such as whether to get married. Making good decisions is a lifelong challenge. Psychologists have long been fascinated by the mechanisms that underlie human decision making. Why do different people make different decisions when offered the same choices? What are common decision making errors? Which choice option is the "best" and why? These questions are addressed in this chapter.

We first outline models and theories of decision making, defining key concepts and terms. We then describe the psychological processes of decision makers and how these approaches can sometimes lead to systematic biases and fallacies. We touch on the related subject of judgment because of the close relationship with decision making in the literature.

# 10.1 Types of Models of Decision Making

Early theories of decision making were often normative in nature. Normative models characterize optimal or ideal decision making, for example, choosing options consistently that yield greater utility or overall usefulness of goods (von Neumann & Morgenstern, 1944). Often, this boils down to choosing so as to maximize money. Psychologists, beginning with Simon (1956), pointed out that humans rarely choose optimally because their information processing capacities are bounded; hence, he introduced the

term bounded rationality to describe this limited rationality and described human beings as satisficers, who choose the first available option that satisfies a given threshold, rather than optimizers, who choose the option that is the best of the set (Payne, Bettman, & Johnson, 1988).

Descriptive models describe real-life behavior in which decision makers fall short of maximizing. Descriptive models characterize how decision makers actually make choices and explain why they do so. These models do not prescribe how decision makers ought to behave if they want to accomplish specific decision goals.

Prescriptive models attempt to bridge the gap between normative and descriptive models. These approaches recommend which steps to take in order to achieve certain normative goals, as for example, guidelines or decision aids in real-world contexts. These include Bransford and Stein's (1984) IDEAL framework, Sternberg's (1986) problemsolving model, the GOFER model of decision making (Mann, Harmoni, & Power, 1991), and Guo's (2008) DECIDE model of decision making.

### 10.2 Foundational Concepts

One of the foundational concepts that underlies models of decision making is expected value (EV; Knutson, Taylor, Kaufman, Peterson, & Glover, 2005). EV is calculated by multiplying the objective probability of the occurence of an event by the magnitude

of the possible outcome (e.g., winning \$10,000). Probability is expressed as a number ranging from 0 (impossible to occur) to 1 (definite to occur). Thus, the EV of gaining \$10,000 with a 0.50 probability would be \$5,000 because \$10,000 x 0.50 = \$5,000.

From a mathematical perspective, the option with the higher objective EV is the "better" or more desirable choice option. However, options that have the same EV are not equally attractive to many decision makers. Consider a choice between gaining \$5,000 for sure (option A: \$5,000 x 1.00 probability = \$5,000) versus a 0.50 probability of gaining \$10,000 versus a 0.50 probability of gaining \$0 (option B: \$10,000 x 0.50 + \$0 x 0.50 = \$5,000). Although both options offer the same EV, economists would describe option B as riskier than option A because its outcome is more variable and therefore more uncertain (Fox & Tannenbaum, 2011). By contrast, some psychologists define risk more broadly, encompassing behaviors such as drug abuse with potentially negative outcomes (e.g., death due to drug overdose). Uncertainty differs from ambiguity, which arises when an option has unknown probabilities. For example, if option B instead consisted of an unknown chance of gaining \$10,000 (otherwise gaining \$0), the level of uncertainty associated with this choice option would be ambiguous.

Characteristics of a choice option—such as its EV or its levels of risk and uncertainty—are important determinants of the choices a person will make. However, decisions are also influenced by the individual characteristics and preferences of the decision maker, such as their tendendies to avoid or embrace ambiguity and risk.

Although there are exceptions where decision makers are ambiguity-indifferent or ambiguityseeking (e.g., cancer patients with an unfavorable prognosis; Innes & Payne, 2009), most individuals demonstrate ambiguity aversion (Camerer & Weber, 1992). This means that most people will favor choice options that are unambiguous over options that are ambiguous. Similarly, most decision makers are risk-averse: When choosing between the riskfree option A and the risky option B we described above, most people will choose A. Nevertheless, this does not mean that option B is never favored. In fact, risk-seeking individuals would be expected to

choose the risky option B, and risk-neutral or riskindifferent individuals would be expected to choose one of the two options at random. As such, it is impossible to classify risky or risk-free options as better than the respective alternative—which one is preferred will depend on the specific choice at hand, as well as the subjective perspective of the decision maker.

# 10.3 Theoretical Frameworks

## 10.3.1 Expected Utility Theory

One theory that accounts for subjective effects such as the phenomenon of risk-aversion is expected utility theory (EUT), which describes a classic normative model of decision making. Unlike EV, EUT represents outcomes non-linearly via a negatively accelerated function of objective magnitude (von Neumann & Morgenstern, 1944). Using this function, if the objective magnitude of a reward was continuously increasing at a set rate, the subjective magnitude of the same reward would increase at an increasingly slower rate, hence "negatively accelerated." In other words, particularly at large magnitudes, the subjective value of a reward will be less than its objective value. When EV is equal, objective outcomes are larger in the gamble, and so the value of risky options is discounted more steeply than the value of risk-free options.

For instance, option B may only be worth \$9,950 to a decision maker. This subjective value is then multiplied by the objective probability of the expected outcome to derive a choice option's expected utility. Comparable to options with high EV, options with high expected utility are expected to be preferred over options with low expected utility. A negatively accelerated utility function for outcomes also explains why many decision makers will choose option A with the certain outcome over option B with the more uncertain or risky outcome. However, in most studies measuring risk preferences, decision makers learn about probabilities and outcomes through written (or spoken) description rather than through experience. Learning about outcomes and their probabilities by experiencing them encourages risk-taking. When decision makers rely on feed-

back, instead of verbal descriptions, to learn about outcomes, they can become risk-neutral or even riskseeking in the gains domain (and risk-averse for losses; Barron & Erev, 2003; see also Weber, Shafir, & Blais, 2004).

# 10.3.2 Subjective Expected Utility Theory

In 1954, the statistician L. J. Savage further refined the idea of subjectivity by introducting subjective expected utility theory (SEU). SEU accounts for a subjective perception of probabilities through a nonlinear transformation of objective probabilities. (This work was one of the major influences on prospect theory, described below, which also assumes nonlinear perceptions of probabilties.) Accordingly, SEU posits that a choice option's subjective value is multiplied by its subjective probability to estimate its subjective expected utility. Options with higher subjective expected utility are hypothesized to be favored over options with lower utility.

# 10.3.3 Prospect Theory

In 1979, psychologists Kahneman and Tversky proposed an alternative to both EUT and SEU called prospect theory (PT; Figure 10.1). PT not only accounts for subjectivity in perceived outcomes and probabilities but also proposes the notion of relative change (i.e., from a specific reference point or status quo; Kahneman & Tversky, 1979). According to PT, outcomes, even when they are objectively equivalent, are subjectively perceived as either upward ("gains") or downward ("losses") adjustments away from a reference point (Tversky & Kahneman, 1986). As a result, PT can explain crucial decision making phenomena such as the framing effect or loss aversion.

#### 10.3.3.1 Framing Effect

The framing effect describes a shift in risk preferences that arises when the same information is either framed as a "loss" (which typically leads to risktaking, that is, choosing a risky gamble over a sure option) or a "gain" (which leads to risk-avoidance,

that is, choosing a sure option over a gamble). To illustrate this effect, remember the two choice options we introduced earlier: A, gaining \$5,000 for sure, and B, a 0.50 probability of gaining \$10,000 versus a 0.50 probability of gaining \$0. As we discussed, many decision makers prove risk-averse when confronted with these choices, and will therefore select the first option (A).

Now, assume that instead of being faced with the possibility of winning money (that is, a "gain" frame), decision makers are given \$10,000 and told they might lose money ("loss" frame). Specifically, decision makers can either lose \$5,000 for sure or take the risk of a 0.50 probability of losing all \$10,000 versus a 0.50 probability of losing \$0. In this context, many decision makers are risk-seeking. This means they prefer the risky option B to the sure loss of \$5,000 in option A. Accordingly, many decision makers reverse their preferences from riskseeking to risk-avoidance depending on the reference point they are given.

By showing that decision makers prefer different choice options depending on the way choices are being presented to them, PT challenges the traditional economic belief that a person's risk preferences are consistent. A psychological approach would be to say that risk preference is not a fixed disposition (Becker, 1976). However, decision science is concerned with the fact that framing effects violate the invariance assumption of EUT, thereby challenging a fundamental assumption that human beings are rational (i.e., have coherent preferences).

#### 10.3.3.2 Reference Point

Like EU and SEU theory, PT hypothesizes that decision makers become less sensitive to changes in gains or losses the farther these values move away from the reference point. For example, the difference between gaining either \$5,000 or \$10,000 is believed to feel more significant to the decision maker than the difference between \$105,000 or \$110,000. This is true even though in both cases, the two choice options differ by an absolute value of \$5,000. This is because \$105,000 and \$110,000 are much farther away from zero than both \$5,000 and \$10,000 are.

#### 10.3.3.3 Loss Aversion

PT further holds that decision makers not only perceive changes differently when they move away from the reference point, but also depending on their direction compared to the reference point (that is, based on whether changes represent gains or losses). The concept of loss aversion follows from the observation that to decision makers, losses "feel" worse than gains of the same magnitude "feel" good (Tversky & Kahneman, 1992). Consequently, decision makers are believed to be more motivated to avoid a loss of a certain value than they are to obtain a gain of objectively equivalent value. PT's framework incorporates loss aversion by modeling a steeper loss function than gain function in its valuation of outcomes, yielding a distorted S-shape, with a flatter top and a longer bottom.

#### 10.3.3.4 Probability Weighting Function

In addition, PT proposes a probability weighting function. According to the probability weighting function, decision makers do not perceive differences in probabilities realistically either. Instead, they underestimate moderate to high probabilities and overestimate small probabilities. As a result, decision makers may wrongfully anticipate the occurence of very unlikely events, such as winning the lottery or dying in a plane crash, but fail to anticipate more common events, such as experiencing a car crash.

In sum, theories of decision making such as EUT, SEUT, and PT predict that decision makers rarely make decisions grounded in the objective characteristics of the choice options they are considering. Instead, decision makers seem to base their choices on subjective perceptions of objective information and personal preferences relating to risks, rewards, and losses. However, predictions made by EUT, SEUT, and PT are not always good descriptions of actual decision making, even at the group level (e.g., Reyna, Chick, Corbin, & Hsia, 2014); we return to this topic below when we discuss an alternative to these theories, fuzzy-trace theory.

# 10.4 Dual Process Theories of Decision Making

# 10.4.1 System 1 and System 2

More recently, decision making researchers including Nobel Laureate Daniel Kahneman have proposed so-called dual process theories of judgment and decision making. This type of theory contrasts intuitive, impulsive decision making (also called "System 1" reasoning) with rational and logical deliberation ("System 2" reasoning; Kahneman, 2003, 2011; Stanovich & West, 2008; see

Figure 10.1: The value function that passes through the reference point is s-shaped and asymmetrical. The value function is steeper for losses than gains indicating that losses outweigh gains. ©Marc Oliver Rieger, CC BY-SA 3.0, https://en.wikipedia.org/

also "Type 1" and "Type 2" processes in Evans & Stanovich, 2013).

Dual process theories generally characterize fast, automatic "System 1" reasoning as the major source of decision making biases (Kahneman, 2003, 2011; but see Duke, Goldsmith, & Amir, 2018, for contradictory evidence). According to EUT and PT, biases such as the framing effect can lead to seemingly irrational judgments of reality or decision making that is not always advantageous. To reiterate, the framing effect occurs when people's subjective perception of different choice options varies depending on how the options are portrayed or phrased, even when, objectively, the choice options are equivalent. We return to the framing effect later in this chapter to discuss when such technically irrational biases can actually turn out to be smart (Reyna, 2018).

#### 10.4.1.1 Temporal Discounting

Dual process theories have also been applied to temporal discounting. Temporal discounting is the tendency to assign a smaller subjective value to a delayed reward compared to an immediate reward (Kirby, 2009; McClure, Laibson, Loewenstein, & Cohen, 2004; but see Kable & Glimcher, 2007). Discounting distant outcomes can lead decision makers to choose smaller, immediate rewards over greater, delayed rewards, and therefore decrease the magnitude of their overall gains. Depending on their patterns of discounting in time preferences, their choices can also violate consistency.

In psychological research, higher rates of temporal discounting have been linked to impulsivity and unhealthy risk-taking such as drug and alcohol abuse (Bickel, 2012; Bickel et al., 2012; Story, Vlaev, Seymour, Darzi, & Dolan, 2014). Accordingly, some researchers have drawn connections between impulsive "System 1" reasoning and higher rates of temporal discounting (that is, higher rates of making suboptimal choices). For example, McClure and colleagues (2004) suggest that distinct neural systems activate when people make impulsive versus patient (willingness to wait for larger rewards) choices in temporal discounting tasks. Alternatively, according to Ballard and Knutson (2009), some brain regions are more sensitive to the magnitude of future rewards while other brain regions are more sensitive to the delay of future rewards. This can affect the perceived value of immediate and delayed choice options and may lead decision makers to perceive delayed rewards as less desirable than immediate rewards.

# 10.4.2 Developmental Dual Process Theories

"System 1" reasoning is traditionally assumed to be phylogenetically and ontogenetically less advanced than "System 2" reasoning, which increases with maturation (Steinberg, 2008). Thus, dual process theories cannot explain why, rather than becoming less pronounced, the strength of the framing effect has been shown to increase with age and experience (Reyna & Ellis, 1994; Reyna & Farley, 2006; Reyna et al., 2011, 2014). In the context of standard dual process theories, this finding is out of place, as mature decision makers are expected to become less susceptible to reasoning biases that have been explained in terms of "System 1" processing, not more. For that and many other reasons, more recently developed theories aim at rethinking some of the core assumptions of standard dual process theories.

# 10.4.3 Fuzzy-Trace Theory

One such theory is fuzzy-trace theory (FTT). Put forward by psychologists Reyna and colleagues (e.g., Reyna, 2012), FTT is a modern dual process theory that distinguishes between developmentally advanced intuition and mere impulsivity, which is believed to be developmentally inferior (Reyna, Weldon, & McCormick, 2015). FTT posits that a person encodes information simultaneously into verbatim representations, which are composed of surfacelevel details, and gist representations, which capture bottom-line meaning. Although roughly categorized as a dual process theory, FTT technically assumes that information is being processed and represented on a continuum between precise, verbatim details on the one end and vague, abstract gists on the other. Verbatim details include concrete numbers, exact wording, and other surface-level information (e.g., "Treatment A has a 30% risk of experiencing side

effects."). Conversely, gist describes the fuzzy meaning underlying such details (e.g., "Treatment A is risky").

### 10.4.3.1 Hierarchy of Representations

The theory posits that the gist of information is encoded at varying levels of abstraction to form a hierarchy of representations, and evidence supports this prediction: The simplest level of gist representation is grounded in categorical yes-or-no distinctions, such as whether or not a choice option entails any level of risk. Imagine deciding between treatment A with a 10% risk of side effects and treatment B with a 0% risk of side effects. Here, a categorical gist representation could be "Treatment A is risky. Treatment B is not risky". More refined representations require ordinal less-or-more distinctions. If treatment A comes with a 10% risk and treatment B with a 5% risk, the corresponding representation might take the shape of "Treatment A has a higher risk than Treatment B". Finally, the most precise representations of information call for exact details, such "Treatment A has a 10% risk of reducing life expectancy by 1 year while treatment B has a 5% risk of reducing life expectancy by 2 years". Which representation will be relied on is ultimately determined by the specificity of the choice at hand, with a preference for the least-detailed representation that allows for a decision (dubbed the "fuzzy-processing preference"; Corbin, Reyna, Weldon, & Brainerd, 2015; Reyna & Brainerd, 2008; Reyna & Lloyd, 2006).

### 10.4.3.2 Developmental Trajectories

According to FTT, decision makers shift from verbatim to gist-based processing as they develop (Mills, Reyna, & Estrada, 2008; Reyna, 2012; Reyna & Brainerd, 2011; Reyna & Lloyd, 2006). In the context of FTT, gist-based processing serves intuition, here defined as an advanced ability to extract meaning and recognize patterns (Reyna, 2012). Since intuition is acquired through age, experience, and expertise, intuitive decision making is believed to be different from impulsive decision making, which peaks in adolescence and becomes less common

182 • Psychology of Human Thought • Chapter 10

# 10.4.3.3 Risk-Taking and Risk Avoidance

to verbatim processing) than adolescents.

Reducing choice options to their bottom-line gist enables decision makers to categorically reject catastrophic risks, without trading off risk for the reward a risky choice option offers. Gist-reliance is often negatively associated with unhealthy risk-taking, whereas verbatim-based processing and impulsivity are often positively related to risk-taking (along with reward sensitivity and impulsivity, explaining unique variance in why adolescents are more riskprone; Mills et al., 2008; Reyna & Farley, 2006; Reyna & Mills, 2014; Reyna et al., 2015; Wilhelms, Reyna, Brust-Renck, Weldon, & Corbin, 2015). Verbatim-based reasoning leads decision makers to weigh risks against benefits, which can facilitate risk-taking if the risks associated with a choice option are perceived as low and benefits are perceived as sufficiently high. For example, the risk of contracting HIV from unprotected sex is low, so decision makers relying on verbatim representations, when weighing the risk of contracting HIV against the benefits from unprotected sex, will consider taking this risk because the benefits outweigh the risks (Wilhelms et al., 2015). Decision makers relying on gist representations, such as that it only takes once to get HIV, would not take the risk of contracting HIV, a catastrophically bad outcome (i.e., no risk of contracting HIV is better than some risk of contracting HIV). Evidence supports these theoretical tenets.

#### 10.4.3.4 Standard and Reverse Framing

When comparing choice options whose risks and benefits differ considerably in size, this can lead children—whose processing veers closer to verbatim-based processing than gist-based processing—to process risks more objectively, and thus to not show irrational framing biases. Some young people, especially those who are sensitive to rewards (e.g., adolescents), may exhibit reverse framing when rewards are large, preferring gambles for gains and sure losses over risky losses (Reyna et al., 2011; Reyna & Farley, 2006). In reverse framing, a person tends to make the opposite choices that one would make in the typical framing effect (that is, choosing the risky gamble in the "gain" frame and the sure option in the "loss" frame). This effect, however, does not carry over into adulthood: Adults, with their greater tendency to rely on the simple gist of choices (such as "losing something for sure" versus "losing something or losing nothing" if presented with a "loss" frame), tend to produce the standard framing effect (Chick & Reyna, 2012; Reyna et al., 2011). Young children do not show framing effects (Reyna & Ellis, 1994). Standard framing first emerges when differences in outcomes are small. When differences are substantial, older children and adolescents display reverse framing by favoring larger but risky rewards over smaller but safe rewards. A preference for reverse framing becomes stronger as adolescents' reward sensitivity develops. The increasing tendency to rely on gist develops with adulthood, in which most decision makers demonstrate standard framing.

#### 10.4.3.5 Developmental Reversal

As initially predicted by FTT, the standard framing effect increases with age and experience (e.g., Kim, Goldstein, Hasher, & Zacks, 2006; Reyna et al., 2014), which is at odds with assumptions of standard dual process theories. Greater development, according to these theories, leads to greater reliance on the slow, labored "System 2" reasoning, leading to fewer biases, like the framing effect, in judgments and decisions, in contrast to what literature has shown (Wilhelms & Reyna, 2013; but see Peters et al., 2006). FTT conceptualizes the increase in the framing effect with age, and other developmental biases that disagree with the predictions put forward by standard dual process theories (such as an increase in the production of false memories), as a developmental reversal (Brainerd, Reyna, & Ceci, 2008; De Neys & Vanderputte, 2011; Reyna & Ellis, 1994; Reyna et al., 2011). Per FTT, developmental reversals occur when less mature decision makers, such as children and adolescents, "outperform" ma-

ture decision makers on certain types of decision tasks. Research grounded in FTT suggests that developmental reversals are the result of an increase in gist-based reasoning with age and experience, which makes mature decision makers more susceptible to reasoning biases that originate from gist-based reasoning than children and adolescents are (Reyna & Brainerd, 2011; Weldon, Corbin, & Reyna, 2013).

## 10.5 Heuristics and Biases

# 10.5.1 Bounded Rationality

Bounded rationality assumes that decision makers are often unable to deliberate each decision slowly and carefully (Simon, 1957; 1991). In other words, decision makers will not always be able to rely on "System 2" processing as it is described through standard dual process theories, even if they are mature and experienced in making decisions. Instead, finite cognitive resources, time constraints, and incomplete information can drive decision makers to fall back on so-called heuristic processing, which is associated with "System 1" processing.

Heuristics are "recipes" or rules-of-thumb that serve as fast and efficient mental shortcuts to simplify many of the decisions and judgments we need to make every day (Gigerenzer & Gaissmaier, 2011). The use of heuristics is assumed to be adaptive and can be highly successful, but heuristics also give rise to biases similar to the reasoning errors we have already introduced in this chapter. When psychologists Amos Tversky and Daniel Kahneman introduced the heuristics-and-biases research program in the 1970s (e.g., 1974), multiple heuristics and biases were identified. Here, we describe some of the most well-known heuristics and biases. Although Gigerenzer and Gaissmaier (2011) emphasize the adaptive nature of heuristics and biases, Tversky and Kahneman also argued in favor of overall adaptiveness (and similarly relied heavily on Simon), but designed tests that revealed human limitations and fallacies. One difference in these approaches is definitional, describing heuristics as processing only part of information in a simpleminded way (Gigerenzer & Gaissmaier) as opposed to substituting one kind of judgment (that comes more readily

to mind, e.g., similarity) for another judgment (e.g., probability; Kahneman, 2003) or processing meaningful gist rather than superficial details (Reyna, 2012). Although some scholars have challenged traditional norms of rationality, assertions about alternatives such as ecological rationality (the degree to which a heuristic is adapted to the structure of the environment) are difficult to test scientifically.

#### 10.5.1.1 Availability Heuristic

To judge the relative probability or frequency of an event, the availability heuristic relies on the ease with which people recall examples associated with different choice options or events. For instance, when asked whether there are more words in the English language that have R as their first or as their third letter, most people—incorrectly—choose the former (Tversky & Kahneman, 1973). This occurs because words that start with a certain letter are more readily available for us to recall than other types of words. In everyday life, decision makers often rely on salient information in their environment (such as information publicized in the news) to evaluate how likely they are to contract certain diseases or to experience specific events, such as a shark attack (e.g., Read, 1995). Because rare and unexpected events are more likely to be publicized than expected events, people will sometimes overestimate the likelihood of uncommon events and underestimate the likelihood of more common events.

More generally, it is crucial to read original articles (rather than only secondhand summaries of them) to fully understand the arguments and counterarguments in the decision making literature. For example, Gigerenzer and Gaissmaier (2011) say that "Neither version of the availability heuristic could predict participants' frequency estimates. Instead, estimated frequencies were best predicted by actual frequencies" (p. 458), but the second sentence of Tversky and Kahneman's (1973) article on the availability heuristic makes a similar point (p. 207): "In general, availability is correlated with ecological frequency, but it is also affected by other factors."

## 10.5.1.2 Recognition Heuristic

In a similar vein, decision makers employ the recognition heuristic to make judgments about pairs of objects or events they have limited knowledge about. Students from Germany and the U.S. were tasked to compare pairs of American or German cities with regard to the size of their populations (Gigerenzer & Goldstein, 1996; Goldstein & Gigerenzer, 2002). Since Americans lacked detailed knowledge about German cities and vice versa, participants simply relied on whether or not they recognized the name of foreign cities (a less-is-more effect). If they recognized only one of the two cities in a pair, they inferred that this city had a bigger population, substituting familiarity for knowledge.

#### 10.5.1.3 Affect Heuristic

People can rely on a different heuristic when evaluating which of two choice options is the riskier one: When comparing risks, the affect heuristic implies that dread increases perceived risk, even when objective probabilities do not warrant this inference (Slovic, 1987). This can skew individuals' understanding of risk-benefit tradeoffs: Although in real life, risks and benefits can be positively correlated (meaning high risks come with high rewards), relying on the affect heuristic has been linked to the perception of an inverse relationship between risks and benefits. Objects or activities that elicit positive affect are typically believed to be high in benefits and low in risks, whereas the opposite is true for objects or activities that evoke negative feelings such as dread (Finucane, Alhakami, Slovic, & Johnson, 2000; Slovic, 1987).

#### 10.5.1.4 Confirmation Bias

Another bias that affects decision makers's ability to reason objectively is confirmation bias. This bias describes people's tendency to selectively seek, attend to, or recall evidence that sides with one's initial opinion (Plous, 1993). Similarly, people have been found to be biased in their interpretation of information lacking clear meaning, construing whichever meaning best fits their personal attitudes. In a seminal experiment, proponents and opponents

of the death penalty read two scientific studies examining whether or not the death penalty deterred murder (Lord, Ross, & Lepper, 1979). While one study found that murder rates decreased in those U.S. states that had introduced the death penalty, the other study found no effect of the death penalty. Unbeknownst to the participants, both studies were entirely fictional. In line with a confirmation bias, participants thought that the study that supported their personal stance on death penalty was more probative than the study that contradicted their beliefs, of which they were markedly more critical.

#### 10.5.1.5 Hindsight Bias

Also referred to as the "I-knew-it-all-along" effect, hindsight bias is observed when, after an event occurs, decision makers overestimate how predictable the outcome was in the first place (Fischhoff, 2007). In one of the first studies designed to test the hindsight bias, decision scientists Fischhoff and Beyth (1975) tasked decision makers to evaluate the probability of several possible outcomes associated with President Nixon's then-upcoming visit to China and Russia. Following Nixon's return to the U.S., participants overestimated the probabilities they had assigned to those outcomes that ended up occurring, exaggerating how foreseeable these events had factually been.

#### 10.5.1.6 Endowment Effect

Some phenomena have not been labeled biases, even though they produce biased judgments and decisions. For example, the endowment effect leads individuals to overestimate the objective value of objects they own, simply because they own them (Kahneman, Knetsch, & Thaler, 1991). This means that people are more partial to the same object if it is in their own possession than when it is in somebody else's possession. In transactions, the endowment effect manifests as an unwillingness to trade objects one owns (Knetsch, 1989), or to demand an exaggerated price in exchange for parting with them. In a famous demonstration of this effect, decision makers who were given a mug charged approximately twice as much money to part with it than they were

willing to spend to acquire the mug when they did not own it (Kahneman, Knetsch, & Thaler, 1990).

#### 10.5.1.7 Sunk-cost Fallacy

Similar to the attachment people feel towards their belongings or property, people also grow attached to past investments. As a result, decision makers often continue to invest time, money, or effort into previously made commitments, even when these commitments fail to pay off. This bias, labeled sunk-cost fallacy, arises because people dislike incurring the loss of resources they have already invested into an endeavor (Arkes & Blumer, 1985). To provide an example, imagine that you have made a nonrefundable downpayment on a nice watch that you plan on gifting to your father. After making the downpayment, you come across a different watch that you like better. But since you do not want to waste the money you have already invested, you purchase the watch you saw first instead of the watch you prefer. This fallacy is typically explained in terms of loss aversion (which we introduced earlier in this chapter), as it aligns with the assumption that decision makers are more motivated to avoid losses (e.g., losing the money invested in the first watch) than to acquire gains (e.g., buying the nicer watch, Tversky & Kahneman, 1986).

#### 10.5.1.8 Status Quo Bias

But even if no prior investments are involved, many people perceive any change away from an existing choice to another choice option as a loss of sorts: The status quo bias (also known as the default effect) treats default settings or previous choices as reference points that are typically preferred over alternative choice options (Samuelson, & Zeckhauser, 1988). For instance, countries that have implemented an opt-out policy for organ donation report much higher consent rates to organ donations than countries in which willing potential donors have to manually opt in (Johnson & Goldstein, 2003). According to Daniel Kahneman and Amos Tversky (1982), this could be because individuals regret their choices more strongly when they suffer negative consequences as a result of a new action than when

they experience negative consequences as a result of inertia. PT suggests that the status quo acts as a reference point for all subsequent decisions, and that the prospect of potential losses associated with leaving the reference point outweigh the prospect of potential gains (because losses loom larger than gains).

### 10.5.1.9 Anchoring Effect

The anchoring effect, another bias, is evident when individuals base their decisions around an initial "anchor value" they encounter, even when this value is unrelated to the question at hand (Tversky & Kahneman, 1974). Once an anchor is in place, subsequent decisions are made by deviating away from this value, which leads to substantial biases in the estimation of prices and other numbers. For example, Ariely, Loewenstein, and Prelec (2003) asked MIT students to write down the last two digits of their social security number and then prompted them to bid for objects such as chocolate or computer equipment. Individuals with higher numbers made notably higher bids than those with lower numbers, suggesting that people anchored their judgments on their social security numbers—despite the fact that these numbers held no relevant information about the value of the auction items. While any salient number can serve as an anchor, anchors do not have to be random or meaningless: often, anchors are highly relevant to the choice context, such as existing baseline values.

## 10.5.1.10 Base-rate Fallacy

Anchor values are not the only way seemingly irrelevant information can bias our judgments. Individuals also engage in what is known as the base-rate fallacy, a reasoning error that ignores generic, statistical information in favor of specific, qualitative information (Tversky & Kahneman, 1985). Consider the case of a person named Steve (Kahneman, 2011), who is known to be shy, withdrawn, helpful, and tidy, with great attention to detail and a love for structure but little interest in engaging with people or the real world. When asked whether Steve is more likely to be a farmer or a librarian, many decision

makers agree that his personality best outfits him to work as a librarian. However, this response neglects to take the underlying base rate into account. In the experiment, this base rate had been presented to favor farmers (also see representativeness heuristic, Kahneman & Tversky, 1973).

#### 10.5.1.11 Conjunction Fallacy

When passing judgment, people are similarly prone to committing what is commonly referred to as a conjunction fallacy: the incorrect assumption that a combination of two or more conditions is more likely to occur than one of these conditions by itself. The most well-known example in this context is that of the fictional "Linda", who is "31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations" (Tversky & Kahneman, 1983, p. 297).

Given this information, is it more likely that Linda is a bank teller or that Linda is a bank teller who is active in the feminist movement? Since the latter is more aligned with Linda's personality, the majority of people side with the second rather than the first option. This type of reasoning, however, is erroneous, as the probability of a single event (i.e., Linda being a bank teller) must necessarily be higher than or the same as the probability of two joint events that are a subset of the more inclusive event (i.e., Linda being a bank teller and an activist). This fallacy is often explained through the use of the representativeness heuristic (Kahneman & Tversky, 1972). This heuristic draws comparisons between specific cases (e.g., Linda's characteristics) and a standard or parent population (e.g., feminists), sometimes resulting in the incorrect conclusion that just because something is more representative, it is also more likely to be probable.

Finally, decisions and judgments are often aided by social factors. Attribution bias is the common tendency to generate different explanations for one's own behavior as opposed to other people's behavior (Ross, 1977): When people evaluate their own actions (such as cutting in line while waiting in a queue), they often attribute them to external or contextual factors (e.g., being late for work). However, when interpreting other individuals' actions, people often believe that behavior is driven by internal factors that are characteristic of the person (such as cutting in line due to rudeness)—possibly because they are unaware of the external factors that affect other people's lives. Aside from such internal and situational factors, judgments and choices will often be governed by social norms. Norms act as implicit or explicit guidelines to inform individuals whether to make a certain decision or not based on what other people around them do or expect them to do.

In this context, psychologists typically differentiate between injunctive and descriptive norms that influence decision making (Cialdini, Reno, & Kallgren, 1990). Injunctive norms outline which behaviors are socially desirable or acceptable, such as tipping a waitress, stopping at a red traffic light, or abstaining from underage drinking. Descriptive norms are perceptions of other people's actual behavior. Consider, for example, an adolescent who is attending a party at a friend's house. This adolescent may decide to embrace underage drinking because she knows or believes that other guests are illegally consuming alcohol as well—even if injunctive norms (such as the law, or her parents' rules) prohibit it. As a result, injunctive and descriptive norms will not always overlap, even though in many cases, they do.

## 10.6 Decision Strategies

As discussed, cognitive, social and situational factors lead decision makers to base their decisions on seemingly irrelevant cues or skew the accuracy of their judgments. In the following part of our chapter, we review which strategies individuals employ to engage with and integrate evidence when sufficient information is available to them. These kind of decision strategies are typically categorized in two ways. The literature distinguishes between compensatory and non-compensatory strategies (Hogarth, 1990; von Winterfeldt & Edwards, 1986). Compensatory strategies allow trade-offs between positive and negative values on different choice attributes whereas non-compensatory strategies take the opposite approach: A positive value in one choice attribute cannot make up for a negative value in another attribute. In practice, this means that some non-compensatory strategies dismiss any choice option that performs poorly on essential choice attributes.

Some of the most commonly studied strategies (e.g., Mata & Nunes, 2010; Svenson & Maule, 1993; Wichary & Smolen, 2016) include non-compensatory, satisficing strategies, such as elimination-by-aspects (EBA; Tversky, 1972) and the take-the-best strategy (TTB) (Hogarth, 1990; von Winterfeldt & Edwards, 1986). EBA requires decision makers to determine which choice attribute is the most important to them and to exclude all choice options from consideration that do not achieve a high enough value on this attribute. This process is then repeated for the second most important attribute (and so forth) until only one choice option prevails (Tversky, 1972). In contrast, TTB simply chooses that option which outperforms other options on a single choice attribute that is deemed "important enough" to enable a decision (or correlated with the outcome; Gigerenzer & Goldstein, 1996). How decision makers know which attributes outperform others is an open question.

Compensatory, optimizing take a different approach than non-compensatory, satisficing strategies do. For example, the weighted additive rule (WADD) weighs the attributes associated with each option by their importance (or other means) and then adds up the different attributes to decide which option is the most favorable. Tallying (TALLY) is a special case of the WADD rule in which pros and cons are just added up without assigning them different weights (EW, equal weighting). EW selects the choice option with the highest sum of all attribute values, treating all attributes as equally important. Unlike EW, WADD assigns different importance to different choice attributes. To choose the choice option with the highest sum of attribute values, WADD first multiplies the value of each piece of information with the importance of the relevant choice attribute and then calculates the sum of these products, as in EV and EUT.

# 10.6.1 Satisficing versus Optimizing

Since using non-compensatory or satisficing decision strategies requires less cognitive effort than using compensatory or optimizing strategies, satisficing becomes more common with age (e.g., Bruine de Bruin, Parker, & Strough, 2016). Although this suggests that older adults could be more prone to making uninformed choices, simulations demonstrate that employing more demanding strategies only leads to small gains in decision quality when compared to non-compensatory or satisficing strategies (Mata & Nunes, 2010). Put differently, some scholars interpret these null effects (no difference between strategies detected for these decisions) to mean that even resource-efficient choice strategies such as TTB and EBA can allow decision makers to make rather advantageous choices. However, research shows that people generally do not optimize in the strict sense of thoroughly processing all available information, but they do seem to process both outcome and probability information along with other simpler representations of that information, as predicted by FTT (see Reyna, 2012).

# 10.7 Conclusion

At the beginning of this chapter, we set out to answer key questions about human decision making: Why are certain choice options favored over others, why do people make different choices when offered equivalent options, and which types of errors commonly occur when people make decisions? Taken

together with the models and theories we outlined at the beginning of this chapter, knowing which strategies people employ and which biases they produce now gives us the means to describe, explain, and predict how decision makers will choose between choice options when facing certain types of decisions.

The theories we have reviewed expect decision makers to favor safe and unambiguous choice options, to pursue options with higher EVs, to adopt simplified choice strategies, or to emphasize gistbased processing when possible. A fundamental finding is that decision makers shift their choice preferences in accordance with the standard framing effect. However, many individuals are not "average" decision makers: which choice options or strategies people consider desirable will depend on their risk and reward preferences, their subjective appraisal of objective choice characteristics such as magnitude and probability, their cognitive resources and processing style, and their susceptibility to both standard and reverse framing. Importantly, these determinants can evolve across the lifespan and in response to situational demands and constraints, suggesting that the same individual may well employ different strategies or come to a different decision when facing the same choice twice. In sum, decision making is a complex synthesis between the choice at hand, the decision makers' individual make up, and the context in which the decision is made, including time constraints and whether there is information available that can help people to come to a decision.

#### Summary


considers subjective evaluations of choice outcomes, subjective expected utility theory also accounts for subjective evaluations of probabilities. Prospect theory combines both of these assumptions with additional hypotheses about reference points, sensitivity to changes from that reference point, and the influence of gains versus losses.


#### Review Questions


#### Hot Topic

Julia Nolte

Julia Nolte, M.Sc., is a Ph.D. student at Cornell University's Department of Human Development. Her research interests span risk perception, decision making, lifespan development, and health. At the moment, Julia is working on tailoring health and risk information to the preferences and needs of different age and patient groups. One line of this research addresses how healthy adults and arthritis patients make healthcare choices when provided with varying information formats (i.e., written information, icon arrays, metaphors). As part of this research, Julia evaluates whether decision makers' reactions to different presentation formats depend on interindividual differences in numeracy and educational attainment. Another line of Julia's research investigates to which extent younger and older adults' information seeking depends on the type of

information provided. Specifically, Julia researches the influence of quantitative (verbatim) and qualitative (gist) information on information acquisition.

David Garavito is a JD/PhD candidate and researcher at Cornell University who, using fuzzy-trace theory (FTT), examines developmental trends in memory and decision making. A large portion of his research focuses on cognitive theory and neuroscience, with his main research focus on the perception of decisions involving brain injuries in sports, as well as the short and long-term effects of concussions and sub-concussive head injuries on decision making and memory. Using both temporal discounting and framing tasks, David examines concussion-induced deviations from developmental trends in decision making and memory and how they may serve as a warning sign for development of possible neurodegenerative diseases later in life. Additionally, David uses structural and functional fMRI data to test theoretical predictions relating neural development, functional activation, and decision making. David seeks to empirically test the predictions of the various dual process theories, with a specific focus on FTT.

David Garavito

Valerie Reyna is the Lois and Melvin Tukman Professor of Human Development, Director of the Human Neuroscience Institute, Director of the Cornell University Magnetic Resonance Imaging Facility, and Co-Director of the Center for Behavioral Economics and Decision Research. She has been elected to the National Academy of Medicine, Society of Experimental Psychologists, and President of the Society for Judgment and Decision Making. Her research integrates brain and behavioral approaches to understand and improve judgment, decision making, and memory across the lifespan. Her recent work has focused on the neuroscience of risky decision-making and its implications for health and well-being, especially in adolescents; applications of artificial intelligence to understanding cancer genetics; and medical and legal decision making (e.g., jury awards, medication decisions, and adolescent crime).

Valerie Reyna

#### Latest Research Highlights

*Differences between description and experience, such as greater risk aversion when gains are described verbally rather than experienced.* For example, suppose that the rate of car thefts is about 1 in 10 in a city (9,989 for every 100,000 people); many people would buy insurance in this situation to protect against the risk of car theft. However, suppose you left your car unlocked for months and never experienced car theft or any other problem with your car. Not experiencing the statistically rare outcome of car theft (or experiencing it rarely) tends to lower the perception of risk, compared to describing the risk verbally.

*Developmental reversals, a growing list of heuristics and biases that emerge with development from childhood to adulthood, contrary to traditional cognitive theories.* For example, given a choice between winning two prizes for sure versus spinning a spinner to win four prizes or nothing, most adults choose the sure thing. However, when given four prizes and offered a choice between losing two prizes for sure or spinning a spinner to lose four prizes or nothing, most adults choose the risky option. Adults avoid the sure loss even when the total number of prizes is a net gain. This bias is not present in children; they pick the risky option about 70% of the time for both gains and losses, responding to the objective outcomes when they are explained simply and displayed clearly. FTT explains framing effects in terms of the qualitative gist of the options (get something or take a chance on getting nothing), as opposed to objective (verbatim) tradeoffs between risk and reward.

*Dual-process models and their counterarguments, including differentiating different kinds of dual processes.* For example, people with higher processing capacity who think carefully will sometimes censor their responses to gains and losses, making their choices more consistent, when they are presented with both gains and loss versions of the same decision. Dual-process theories suggest that this censoring is an example of deliberative System 2 thinking inhibiting intuitive System 1 thinking.

#### References

Cozmuta, R., Wilhelms, E. A., Cornell, D., Nolte, J., Reyna, V. F., & Fraenkel, L.

(2018). The influence of explanatory images on risk perceptions and treatment preference. *Arthritis Care and Research*, *70*(11), 1707–1711. doi:10.1002/acr.23517

Garavito, D. M. N., Weldon, R. B., & Reyna, V. F. (2017). Fuzzy-trace theory: Judgments, decisions, and neuro-economics. In A. Lewis (Ed.), *The Cambridge Handbook of Psychology and Economic Behavior* (pp. 713–740). Cambridge University Press. doi:10.1017/9781316676349.026

# References


e/faculty/research/Fischhoff-Hindsight-Early-Histor y.pdf


ases than college students. *Psychological Science*, *25*(1), 76–84. doi:10.1177/0956797613497022



probability judgment. *Psychological Review*, *90*(4), 293–315. doi:10.1037/0033-295X.90.4.293


# Glossary



# Chapter 11

# The Nature of Language

LISA VON STOCKHAUSEN & JOACHIM FUNKE

University of Duisburg-Essen & Heidelberg University

Imagine you meet a friend after the summer and start a conversation about your holidays. In a fluent and easy exchange you get a lively idea of your friend's experience, what the places she went to looked like and even how the people were. Your friend in turn shares your embarrassment about a mishap you had at the airport when you mistook another passenger's suitcase for yours. This situation reveals a lot about the nature of language: the common ease and fluency when we use it, the context of conversing with others as its most natural and frequent use, the vividness and detail with which we can express and also understand things that are no longer present.

More formally, language can be described as a system of symbols by means of which human beings express an infinite variety of content, given finite resources. Its enormous expressive power is based on three basic features: First, meaningful elements are created from a set of units which themselves are not meaningful (the so-called duality of patterning; the phoneme *s*, e.g., is not meaningful by itself but together with the phonemes *k* and *y* it may form the meaningful unit *sky*). Second, an infinite set of sentences can be created from a finite set of rules (productivity or generativity of language). Finally, there is the feature of displacement, meaning that we can express anything irrespective of it being present in the current moment (Fitch, 2010; Hockett, 1959). Language is not bound to one modality but can be spoken, written, or signed.

The faculty of language evolved not only in response to biological but also to cultural demands and is continuously adapted and changed through cultural transmission. This results in a great diversity of forms and structures so that the notion of universal features shared by all languages (so called language universals) nowadays is highly controversial (Evans & Levinson, 2009; Hauser, Chomsky & Fitch, 2002; Levinson & Evans, 2010). Giving the sentence "The farmer kills the duckling" as an insightful example, Edward Sapir (1921, p. 65-96) shows how conceptual information is coded in the grammar of the English language: The *noun* "farmer" signifies a doer, the *verb* "kills" marks something being done. The categories *subject* and *object* tell us who initiates and who receives an action. The grammatical category of *number* indicates how many of a kind were involved. The category of *tense* tells us when the event happened, etc. Sapir identified thirteen of these grammatical-conceptual units in this sentence. In certain other languages, this information may be coded differently or not at all, whereas still other languages may grammatically code aspects that are missing in English (e.g., was the action observed or only received by hearsay?, see Chapter 12, "Language and Thought").

While languages differ widely regarding forms and structures, at their core, they all enable a detailed and abstract representation and description of the world that is independent of the current context. Against this background, in the present chapter, we discuss the nature of language as a means which allows for representation and communication, which in turn lie at the core of the human faculty of thinking and problem solving.

First, we lay grounds in sections on how languages are usually learned (language acquisition; implict grammar leaning) and how we deal with more than one language (Bi-/Multilingualism). We then turn to research on language as a tool for representation and communication (Language as embodied simulation; Alignment in dialogue). We close with a look at studies that show the role of language in shaping our views of the social world.

# 11.1 Language Acquisition

How do people acquire language? If you had been born and grown up in China, would you be able to speak Chinese? The answer to this question is obviously "yes". According to the hypothesis of Noam Chomsky, an American linguist, people are born with a universal grammar, a "language acquisition device" (LAD; see, e.g., Chomsky, 2011). Most importantly, learning a first language is possible without much instruction (implicit learning, grammar learning). Arthur Reber is an American researcher who first analyzed this process of implicit learning by means of artificial grammars (see Textbox 11.1 below and Reber, 1967, 1989).

#### Textbox 11.1: Implicit learning with artificial grammars

People learn the (inherently complicated) grammar of their first language L1 without explicit instruction. How is this possible? To experimentally research the processes behind grammar learning, Arthur Reber had the idea to use artificial grammars. Grammars are sets of rules, in the case of language, for example, rules for correct positions of words in sentences. A correct sentence can be understood as a series of transitions between different types of words. Instead of words, Reber decided to set up a simple grammar that constitutes transitions between letters. See, for example, the following graph with six knots (S0 to S5). The labelled arrows indicate the transitions between knots which are allowed according to the grammar. All transitions here are unidirectional, the arrows' labels are the letters A to H, respectively:

Figure 11.1: Artificial grammar with six knots, S0 to S5, and eight unidirectional transitions between them shown as labelled arrows; the labels are the letters A to H, respectively. The graph is similar to the one used by Reber (1967).

Imagine you start working with the graph at S0, where you can choose either the way up to S1, thereby producing an "A", or the way down to S2, producing a "B". We choose S1, move on to S3 (we have no other choice), producing a "C". Then at S3, we could either stay there and produce a single "E" (or a series of them), or move on to S5, producing an "F" and reaching the end. We have produced a trail of letters: ACEF. This is a letter sequence that is compatible with the shown grammar. Other acceptable letter sequences could be BDG, BHCEEF, ACEEEEF. A sequence like ABCD would be equally incompatible with this grammar as would be GDB, as according to the shown grammar, there is no arrow back from G to D or from D to B. Reber found that participants, when presented with sequences of letters without being told about underlying rules, could differentiate compatible from incompatible sequences far beyond random despite their inability to explain the reasons (i.e., they were not aware of the hidden structure of the grammar). Reber concluded that participants had learned about correct and incorrect sequences

in an implicit way: they could not explicitly give reasons for their grammaticality judgements but showed with their above-random decisions that they had learned the rules of transitions.

The Leipzig-based anthropologist Michael Tomasello developed another idea concerning language acquisition. He argues for a Usage-Based Theory (UBT; Tomasello, 2003) without innate grammar detection. Instead, more general cognitive "modules" come into play. Children use their innate faculty to categorize, to use analogies, and to understand action intentions. Through listening in social interactions, within a context of joint attention where the child and adult(s) coordinate their attention toward each other and toward a third object, children extract grammatical categories and rules. They first produce simple constructions (e.g., *There is x, I x this*) which they apply by analogy to new situations. Further on in the acquisition process they then combine the constructions to more complex utterances (*There is the X that mummy Yed*). The UBT offers a challenging alternative to the idea of innate grammar learning.

Typically developing individuals acquire a language by passing through a sequence of stages. In a rough sketch, it starts with the first sounds, followed by babbling, then the first words ("milk"), then a two-word stage ("sit chair") up to full use of language. Acquisition of syntactic rules and a growing size of vocabulary is part of this sequence. An important fact is that there are sensitive periods for the different stages.

An interesting question concerns the ability of primates such as chimpanzees to learn a language. There have been a lot of experiments to train chimpanzees. The most prominent case of alleged language acquisition in chimpanzees is reported by Gardner and Gardner (1969). They trained an infant female chimpanzee named "Washoe" to use the gestural language of the deaf, American Sign Language (ASL). After 22 months of training, Washoe could use 30 signs appropriately and spontaneously. Transfer to new referents as well as combinations and recombinations of signs have been observed. From other studies it is known that nonhuman primates can indeed learn to manipulate symbols to gain certain rewards (Snowdon, 1990). But in the end, there not only remains a quantitative difference between chimps and humans but also qualitative differences. An example is the level of meta-language (i.e., speaking about speaking) or figurative language including the understanding of irony which have not been found in animals at all.

### 11.2 Bi- and Multilingualism

Is there a price to pay if a child grows up, for example, with parents who speak two different languages, or if a child grows up in Germany (learning German as their first language, L1) and then, say at the age of 3, moves to the US to learn English as a second language, L2? The case of bilingualism (two languages) or multilingualism (more than two

languages) is an interesting and rather common phenomenon.

The *Critical Age Hypothesis* states that during the first years of life, a person would learn any language as L1, given that enough verbal stimulation is present. After that critical period, learning another language requires much more attention and explicit instructions. This points to different mechanisms of acquisition and learning behind L1 and L2 respectively. The hypothesis, including critical remarks, is described in more detail by Vanhove (2013).

In the beginning of bilingualism research, the assumption was that a bilingual person might have disadvantages due to the increased load of keeping two language systems separate. In later research, however, it was hypothesized that bilinguals might have advantages through better trained executive functions (EF) (e.g., Bialystok, Craik, Klein, & Viswanathan, 2004). These EF are part of selfregulation and thought control. Traditionally, they comprise three control functions: updating of information, shifting/switching of attention, and inhibitory control of distractions. By now, several studies have challenged the hypothesis of cognitive benefits through bilingualism. Paap et al. (2015, 2017), for example, tested the hypothesis that bilinguals might have advantages in EF and found no evidence for such positive effects. However, the debate is ongoing and recent publications again argue for cognitive benefits of multilingualism (cf. Quinteros Baumgart & Billick, 2018).

# 11.3 Language as Embodied Simulation

It is commonly agreed that language recruits neurological structures that have been around for a longer time than language itself (Zuidema, 2013). The theoretical approach of language as embodied simulation draws upon this idea: it assumes that language processing is principally grounded in sensorimotor experience and shares representational formats with non-linguistic processes, such as perceiving and acting (see Chapter 5, "Knowledge Representation and Acquisition"). Such experiences leave traces in our minds that become reactivated in comprehending

language. Thus, language comprehension basically *is* simulating the reality that is being described linguistically (Zwaan & Madden, 2005). This contrasts with the traditional view of language comprehension as the process of manipulating abstract symbols and creating amodal representations, and only then to interact with other cognitive systems (Weiskopf, 2010).

Empirical evidence supporting the embodied language view stems from experiments that show effects of linguistically described reality on people's behavior and on neuronal responses which cannot easily be reconciled with the idea of amodal and abstract representations (Buccino, Colage, Gobbi, & Bonaccorso, 2016). To give an example of the effect of *appearance* (cf. Stanfield & Zwaan, 2001), a description of a nail that is pounded into a wall implies a different orientation of the nail than one of a nail pounded into a floor. After reading respective descriptions of objects, participants had to determine if a presented picture showed an object mentioned in the previous sentence. Response times were shorter when the verbal description matched the appearance of the object in the picture, suggesting that the appearance of an object in a described context is part of the mental representation of the sentence, even when it is in no way relevant to solving the task. Besides their appearance, objects are also characterized by their pragmatic features that determine how and for what purpose we deal with these objects. Studies on the role of these pragmatic features, also called *affordances* of objects, in comprehension showed that participants processed information faster or found it more sensible when it matched affordances of aforementioned objects (e.g., filling a sweater with leaves (afforded) versus water (not afforded) to substitute for a pillow; Glenberg & Robertson, 2000).

Effects of *action compatibility* can be observed when an experimental task comprises movements toward or away from the body that are compatible or not with a linguistically described movement. Action compatibility effects even occur for abstract movements such as radioing a message or telling a story (Glenberg & Kaschak, 2002). Further studies support the notion that not only simple and concrete objects are subject of simulation. Objects that are part of negated sentences appear to be simulated,

too (e.g., *There was no eagle in the nest/sky*; Kaup et al., 2007), as are contents of figurative language and abstract concepts (e.g., the balance of justice, cf. Gibbs & Perlman, 2010).

Recent discussions of the approach of embodied language focus less on the question of whether representations are in principle either grounded in sensory experience or symbolic but rather on the degree to which language users simulate the content of linguistic input during comprehension. This seems to depend on their expertise regarding the content, their linguistic skill, and the content itself, as, for example, the description of a cooking show is easier to simulate than the content of a legal document (Zwaan, 2014, 2016). Neurophysiological data tentatively but not yet conclusively support the notion of language processing as simulation (e.g., Buccino et al., 2016; Mollo, Pulvermüller & Hauk, 2016).

This approach highlights the possible role of language in problem solving. Capturing a problem space in language does not necessarily translate it into an abstract amodal code but may rather help to properly represent perceptual aspects of the situation, spatial relations, temporal or spatial dynamics, or a perceiver's perspective by simulating what is being described in an experience-based manner. In this sense, solving problems that comprise sensorimotor aspects should benefit from experience-based simulation through language.

### 11.4 Alignment in Dialogue

Dialogue represents the most natural use of language and is closest to the conditions of its early stages in evolution (as opposed to monologue, as well as reading and writing; written language was invented only about 7000 years ago; Zuidema, 2013). In terms of problem solving, dialogue is a powerful form of action that enables the exchange of ideas, joint planning, and transfer of experience and expertise independent of context (see Chapter 5, "Knowledge Representation and Acquisition"). From a psycholinguistic point of view, dialogue is characterized by a constant exchange between interlocutors, requiring listeners to be prepared to speak, and speakers to listen throughout the process.

Traditionally, language production and comprehension have often been studied separately and mostly out of social or even out of larger linguistic context. Representations underlying production and comprehension were not considered to be necessarily linked and the separate stages of planning an utterance—from preverbal concepts via syntactic, lexical and phonological encoding to phonetic realization (cf. Levelt, 1989)—were supposed to be irrelevant to the listener. The listener would in turn create their representation of the utterance in stages—from decoding sounds through to a conceptual understanding—which used to be considered irrelevant to the speaker. An interesting question is that of how can a dialogue, with its constant changes of roles in real time, its overlapping complex processes of interpreting and planning, be so easy and effortless, even for young children?

Pickering and Garrod (2004) proposed an interactive alignment model of dialogue stating that interlocutors adjust and align their representations on all linguistic levels: on the level of the situation model representing the described content in context, on the level of syntax and the lexicon, through to the level of articulation and speech rate. According to this approach, comprehension and production draw upon the same representations and are based on closely intertwined processes, not only intra- but also interindividually. Each linguistic level in a speaker's utterance influences the respective level of the listener in comprehension and in turn the planning of following utterances. A word that has been comprehended is more likely to be produced; repeating a word or using semantically similar words enhances the alignment of syntax, and so forth (see Figure 11.2; cf. Garrod & Pickering, 2009). When, in a dialogue, the listener becomes the speaker, the process continues with changed roles. The ease with which this constant role change takes place and with which interlocutors, for example, complete each other's utterances is enabled by highly automatized priming processes, with each level on the production side priming the respective level of the hearer on the comprehension side. The result is a high level of repetition and a high level of imitation in dialogue (Pickering & Garrod, 2004).

Figure 11.2: The figure shows the different levels of linguistic representation involved in language comprehension and in language production and the relatedness between the levels within and between two interlocutors A and B in dialogue, according to the interactive alignment model (Fig. 2 from Pickering & Garrod, 2004; reproduced by permission of Cambridge University Press).

The approach of interactive alignment locates and studies language in the context of its function as an action (Garrod & Pickering, 2009; Pickering & Garrod, 2013). It thereby places language in line with other strategies of coordinating one's behavior with others, based on perception-action links (Garrod & Pickering, 2004). Perceiving a facial expression or body posture often results in (overt or covert) imitation (Dijksterhuis & Bargh, 2001). In a similar way, comprehending language (i.e. perceiving) goes along with emulating the interlocutor's action of language production - this in turn facilitates one's own production (Pickering & Garrod, 2013). As Dijksterhuis and Bargh (2001, p. 3) put it "In sum, perception is for doing".

Alignment in dialogue has been studied in diverse paradigms. Garrod and Anderson (1987) presented participants with a maze game in which dyads of players who were seated in separate rooms had to find their ways through a maze, made up of paths,

nodes, and gates, to reach a goal. Coordination was required since, for a player, only his or her own position in the maze was visible and movements of one player could change the configuration of gates in the partner's maze. Thus, players were motivated to work out each other's position in dialogue and to coordinate their movements toward the goal. Dialogues showed that, without explicit negotiation, partners quickly converged on specific representations of the maze and respective ways of describing it (e.g., by moving along a path, by referring to a line intersection or describing a subsection of the maze figuratively). These patterns of description changed between games, suggesting that they emerged locally in a specific dialogue through alignment. In an experimental demonstration of alignment of syntax, Branigan, Pickering, and Cleland (2000) developed a "confederate scripting technique", with two persons participating in a dialogue about pictures describing actions involving

an agent, a patient, and a beneficiary. One participant was a confederate who described the depicted scene with systematically varying syntactic structure (*The A gives/hands/offers/. . . the B to the C* or *The A gives/hands/offers/. . . the C the B*). The study shows that participants adjusted their syntax to the confederate's, for example, tending to use a prepositional phrase when the confederate had just used one. The effect was stronger when confederate and participant used the same verb but also occurred between descriptions with different verbs.

The universal quality and robustness of alignment has been underlined by studies showing effects across modality (people align their speech styles to words that they listen to or lip read; Miller, Sanchez, Rosenblum, 2010) as well as across languages (as shown in code switching by Kootstra, van Hell & Dijkstra, 2010, and in dialogues of bilingual speakers with differing L1 and a shared L2; Trofimovich & Kennedy, 2014).

As mentioned earlier, alignment is assumed to happen implicitly and automatically. This contrasts with other views on dialogue. Coordination in dialogue, for example, was supposed to go back to the common ground (Clark, 1996), shared knowledge based on communal experience (such as culture, language, ethnicity) and personal experience. Common ground in the traditional view has to be established and updated in working memory to make a dialogue aligned. In the interactive alignment framework, however, common ground is created bottom up through what is shared between interlocutors. Well-aligned interlocutors do not have to infer meaning because they both sample from very similar representations, including situation models. Only in case of apparent misalignment may common ground be established as an explicit strategy. It is part of a repair process, not of the regular process of alignment (Pickering & Garrod, 2004).

The fact that linguistic behavior is deeply embedded in a larger social and behavioral context is underlined by findings that show the influence of non-linguistic factors (e.g., gender or quality of a relationship) on the degree of conversational convergence (Gambi & Pickering, 2013; Pardo, Gibbons, Suppes, & Krauss, 2012).

The interactive-alignment model of human dialogue underlines the deeply social function of language, which means efficiently communicating and coordinating with our fellow human beings. In the final part of the chapter, we further broaden our perspective on the social nature of language and present evidence for its influential role in shaping our understanding of social reality.

# 11.5 The Role of Language in Representing and Constructing Social Reality

Because language use is both ubiquitous and automatized, the influence of language on representing and constructing social reality is both powerful and subtle (see Chapter 12, "Language and Thought"). Several linguistic biases have been identified in the literature. Semin and Fiedler (1988) proposed the *Linguistic Category Model* stating that different kinds of descriptions of persons and their behaviors vary in terms of abstractness. This, in turn, affects how informative a description about a person is and how temporally stable a described quality is perceived to be. For example, *descriptive action verbs* refer to a particular activity in a specific situation (e.g., *kiss, talk, stare*) and do not reveal lasting features of a person. *Interpretive action verbs* (such as *help, inhibit, imitate*) still refer to observable actions that, however, belong to a more general class of behaviors and require interpretation. Still more abstract is a description with *state verbs* referring to mental or emotional states with no clear beginning and end (*hate, like, notice*). Finally, descriptions based on *adjectives* (e.g., *honest, reliable, creative*) abstract characteristics from observable behavior and a concrete context and assign dispositional qualities that are rather stable over time.

Relying on this model, studies on *Linguistic Intergroup Bias* showed that descriptions of persons and their behaviors differ in their level of abstractness, depending upon the person belonging to an observer's ingroup or outgroup and on the behavior being desirable or not (Maass, Salvi, Arcuri & Semin, 1989). Favorable behaviors by outgroup members are described in a more concrete way (e.g., *X helped somebody* as opposed to *X acted in an altruistic way*), implying that this behavior might not be stable over time. In contrast, undesirable behaviors of outgroup members are described in rather abstract ways (e.g., *X is being aggressive* as opposed to *X hit somebody*), inviting one to generalize from the situation and thus suggesting stability over time. Descriptions of ingroup behavior follow the opposite pattern, with desirable behavior being described more abstractly and unfavorable behavior more concretely. This implies stability of the desirable and dependence on the situational context of the undesirable behavior.

Further research suggests that the dimension of abstractness versus concreteness underlying person descriptions may reflect observers' expectations, called the *Linguistic Expectancy Bias* (Wigboldus, Semin & Spears, 2000). Behaviors that are expected on the basis of stereotypes about social groups are described on a more abstract level (*Alice is emotional*) and lead to inferences regarding a person's disposition whereas behaviors that violate stereotypes and are therefore unexpected are described in concrete terms (*Paul brushes tears from his eyes*). Such behavior is rather attributed to the situational context and not to a person's disposition. Besides the level of abstraction, the use of negation may also indicate if a behavior is expected or not. Beukeboom, Finkenauer, and Wigboldus (2010) showed that participants used more negations to describe a behavior that violated stereotypical expectations (e.g., *Mary is not bad at math* rather than *Mary is good at math*). Furthermore, they interpreted negations as indicating that a described behavior deviated from the speaker's expectancies (i.e., the speaker did not expect Mary to be good at math), attributed them more strongly to situational than dispositional factors and evaluated negated descriptions as more neutral than affirmative descriptions (i.e., *being not bad at math* is not as good as *being good at math*; the analogue applies to negative attributes: *being not kind* is less unkind than *being unkind*).

Aspects of interpersonal context have been shown to affect these biases and that in principle they can be used strategically, such as when the communicative goal is to convince an interlocutor or to mitigate a negative description (e.g., stating that someone is

not smart is less offensive than saying he or she is stupid; cf. Beukeboom, 2014). However, on a daily basis of communication and based on highly automatized processes of stereotype activation in language use, these biases work implicitly beyond people's awareness.

The previous research shows that expectations based on social stereotypes are expressed linguistically in subtle ways. The following studies further underline how deeply interwoven processing language is with our ideas about (social) reality. Using eye-tracking methodology during reading and therefore assessing the process of understanding on a moment-to-moment basis, these experiments show that violations of our expectations concerning social reality slow down fundamental aspects of language comprehension, such as interpreting pronouns or assigning thematic roles.

In a study by Reali, Esaulova, and von Stockhausen (2015), participants read descriptions of typical activities of a person being in a specific profession. The person was denoted by initials only, so that gender was not indicated. The professional role could either be typically male, typically female, or neutral. A typically male description read, for example, *M.F. repairs and produces furniture, works with wood*. Each description was followed by a target sentence that contained a personal pronoun referring to the described person, such as *Usually he/she has a sufficient income*. When the pronoun was not congruent with the gender stereotype of the described role (such as carpenter + she, florist + he), participants had greater difficulties to resolve the pronoun as reflected in longer fixation times. It is worth noting that neither was gender explicitly indicated in the descriptions nor did they contain role nouns that directly denote the profession. Thus, the gender-related expectations could only be based on the gender typicality of the described behavior. Effects were independent of participants' individual gender attitudes.

Esaulova, Reali, and von Stockhausen (2017) showed effects of expectations regarding gender typical roles and behavior on comprehending thematic structures. Take as an example the two sentences *The flight attendant who observed many tourists is attentive* and *The flight attendant whom many*

*tourists observed is attentive*. These sentences differ regarding the thematic roles of the protagonists that relate to the performed action. In the first sentence, the flight attendant takes the agent role (i.e., initiates or causes the action), the tourists take the patient role (i.e., receive the action). In the second sentence, roles are swapped with the flight attendant now receiving the action and the tourists causing it (taking the agent role). In the English translation of the materials, thematic roles are clearly indicated by the relative pronoun *who/whom*. However, in the original (in German), both versions were identical until the end of the relative clause was reached and the verb form indicated who did the observing (singular form in case of the flight attendant, plural form in case of the tourists). That is, only after reading both nouns were participants able to solve the ambiguity regarding thematic roles. By then they were expected to have built up expectations regarding agent and patient depending on the role nouns' gender typicality (flight attendant is a typically female role, tourist is neutral) and depending on grammatical gender (masculine or feminine). Eye movements showed that participants took longer to resolve the relative clause and found it more difficult to assign the agent role to a role noun in feminine rather than masculine grammatical gender and to typically female as opposed to neutral role nouns. Feminine grammatical gender (which usually indicates female biological gender) and female gender typicality better qualified a noun for the thematic role of patient

than agent, reflecting the strong link between masculinity and agency in gender stereotypes (Koenig, Mitchell, Eagly, & Ristikari, 2011).

The reported effects in eye movements occurred within the very first stages of understanding, based on highly automatized processes and not being strategically controlled (for replications see Esaulova, & von Stockhausen, 2015; Reali, Esaulova, Öttl, & von Stockhausen, 2015).

To summarize, there are implicit biases in language production and comprehension that express social stereotypes and, in both listeners and speakers, lead to stereotype congruent inferences (Beukeboom, 2014). It is in that sense that language does not only reflect but also shapes and maintains social reality. The underlying mechanisms are deeply embedded in lexical, semantic, and syntactic features of language and our use of them: the verbs we use for a description, our use of negation, interpreting pronouns and relative clauses, our assignment of thematic roles. In this way, we are dealing with an essential aspect of the nature of language, that of representing and expressing our sense of reality.

# Acknowledgements

The authors would like to thank Dr. Yulia Esaulova for her insightful comments on a draft version of this chapter and Katharina Conrad for copy editing the manuscript.

#### Summary


#### Review Questions


#### Hot Topic

Lisa von Stockhausen

My research program addresses the question of how linguistic structures and cognitive processes reflect social reality. Specifically, in my lab we study automatic processes underlying the representation of gender in language. In our experiments, participants are confronted with linguistic input that may conflict with their expectations concerning social reality, such as men working in typically female occupations or taking passive (patient) thematic roles.

Using methods of measurement with high temporal resolution (such as eye-tracking), we could show that violating expectations regarding social categories slows down language comprehension in its earliest stages, indicating the highly automatized ways in which social cognition is embedded in language.

Another area of my research are cognitive mechanisms underlying mindfulness. The focus here lies on the question of and how guiding one's attention

(to the present moment and without judgment) can be trained, how this affects our basic faculty of attention regulation and in turn processes of self-regulation.

#### References

Esaulova, Y., Reali, C., & von Stockhausen, L. (2017). Prominence of gender cues in the assignment of thematic roles in German. *Applied Psycholinguistics*, *38*, 1133–1172. doi:10.1017/S014271641700008X

Garnham, A., Oakhill, J., von Stockhausen, L. & Sczesny, S. (2016). Editorial: Language, Cognition, and Gender. *Frontiers in Psycholog*y, *7*, 772. doi:10.3389/fpsyg.2016.00772

Wimmer, L., Bellingrath, S. & von Stockhausen, L. (2016). Cognitive effects of mindfulness training: Results of a pilot study based on a theory driven approach. *Frontiers in Psychology (Section Consciousness Research)*, *7*, 1037. doi:10.3389/fpsyg.2016.01037

### References


pelling evidence for a bilingual advantage in switching or that frequent language switching reduces switch cost. *Journal of Cognitive Psychology*, *29*(2), 89–112. doi:10.1080/20445911.2016.1248436


# Glossary


# Chapter 12

# Language and Thought

#### ANDREA BENDER

University of Bergen

Language is not indispensable for thought. Nonhuman animals solve complex cognitive tasks while lacking anything close to the human communication system; and human children achieve incredible cognitive feats long before they are able to participate in conversations. Still, language is our most powerful tool for the bulk of cognitive activities we frequently engage in, from the categorization of our perceptions to the planning of our actions. But the language we speak as our mother tongue is also a tool with a history and of a specific shape. It is structured through and through, in ways that differ from one language to the next, and it comprises classification systems, sets of contrasts, and requests for specifications that would seem to afford and suggest some lines of thought more easily than others. As we know from research on problem-solving (see Chapter 9, "Problem Solving"), tools are typically used in a specific context and for specific purposes, while using them in novel ways is challenging for humans (e.g., Duncker, 1935). A similar phenomenon might therefore be expected for language when used as a tool for describing observations, categorizing them, or drawing inferences from them. This analogy raises a tantalizing question: Do speakers of different languages develop different views of the world?

Suggestive phrasings that influence interpretation and memory (Carmichael et al., 1932; Loftus & Palmer, 1974), requests for using gender-neutral or -inclusive language to reduce gender discrimination (Irmen & Kurovskaja, 2010; Prewitt-Freilino et al., 2012), and the finding that repetitions of wrong statements make them sound true (the *illusory truth effect*; Bacon, 1979; Hasher et al., 1977), all attest to the power that language can unfold in shaping the social world (see Chapter 11, "The Nature of Language"). But does it gain this power by only shaping the world we live in or also by directly affecting our cognition? To what extent do systematic differences between languages and their grammatical structures cause differences in how their speakers perceive, categorize, reason, or make decisions?

To address these questions, we first present the *principle of linguistic relativity* and its various readings (sections 12.1 and 12.2). Two of the most plausible readings are then examined in more detail, illustrated with one example each: color perception and numerical cognition (section 12.3). Against this backdrop we then elaborate on the role of language as a cognitive tool (section 12.4).

# 12.1 The Principle of Linguistic Relativity

The idea that language may affect thought can be traced back at least to the 18th century, to scholars like Johann Gottfried Herder (1744–1803) or Wilhelm von Humboldt (1767–1835). Today, however, it is most strongly associated with the names of ethnolinguists Edward Sapir (1884–1939) and Benjamin Lee Whorf (1897–1941), which is why

the idea in more recent literature is often referred to as the "Sapir Whorf hypothesis" or the "Whorfian hypothesis". Whorf (1956) himself used the term "principle of linguistic relativity", deliberately in the style of Einstein's *principle of relativity* in physics—for two reasons. First, Whorf made a claim similar to Einstein's, namely, that objective or absolute descriptions of the world independent of a given viewpoint are impossible (hence "relativity"), in this case because our perceptions and categorizations are influenced by the linguistic structures implicit in our native languages. Second, Whorf considered this linguistic relativity a premise for research, not its target (hence a "principle" and not a "hypothesis"). In the cognitive sciences, and specifically in cognitive psychology, this idea is still highly controversial, even to this date.

# 12.1.1 Fundamental Theses

The principle of linguistic relativity is based on three general theses (Lucy, 1997; Wolff & Holmes, 2011):


Let us illustrate this argument with a concrete example. Even closely related languages differ with regard to classes into which they sort their nouns (Thesis 1). So-called formal gender languages assign a grammatical gender to every single noun. Romance languages like French, Italian, or Spanish, for instance, contain two of these classes: *masculine* and *feminine*. Other Indo-European languages like Greek, German, or Russian make use of a *neuter* in addition to the masculine and feminine. Parts of Norwegian, Swedish, or Dutch conflate two of these, namely, masculine and feminine gender as *common gender* in opposition to the neuter gender. And English has given up all gender distinctions (at least those that are not grounded in biological gender)

and hence is no longer a formal gender language. Still, it differentiates gender in personal pronouns ("he", "she", "it"), but even this apparently basic categorization is not a linguistic universal. Polynesian languages such as Tongan, for instance, distinguish between a single person, pairs of persons, and groups of persons, and between selections of people that do or do not include the addressee, but they do not care about gender.

According to Thesis 2, linguistic categorizations like gender classes or inclusion criteria for personal pronouns help us to organize and structure the "kaleidoscopic flux of impressions" in which the world is presented (Whorf, 1956, p. 213f.). Since the linguistic categories constitute the largely indiscernible background against which our conscious considerations take place, they do their organizing work without our noticing it. It may appear only consistent, therefore, to assume that linguistic categories in the language in which one forms one's thoughts would contribute to shaping those very thoughts. Applied to our example, such organizing would be at work if speakers of German associated the sun more strongly with female attributes because *die Sonne* is feminine, and the moon more strongly with male attributes because *der Mond* is masculine—in contrast to, for instance, speakers of Spanish, for which *el sol* is masculine and *la luna* feminine (Koch et al., 2007).

Thesis 3, finally, implies that speakers of German differ in their associations of sun and moon from speakers of Spanish precisely because the two languages assign grammatical gender reversely to these two words. Examples like these are the key target of crosslinguistic studies on linguistic relativity, and we come back to findings from such studies in section 12.2.

# 12.1.2 Do Languages Differ in their Description of the World?

As you may have noticed, if you read Chapter 7 on deductive reasoning carefully, the three theses form a syllogistic argument, in which Thesis 3 follows logically from Theses 1 and 2. In other words, if the first two theses are considered true, the third one must also be considered true. But even the apparently least controversial Thesis 1 was rejected

for several decades in both psychology and linguistics. Distinguished scholars advocated the position that the commonalities of human languages, which they attributed to a "universal grammar" module in humans, by far outweigh the differences between languages (e.g., Chomsky, 1986; Pinker, 1994). On this account, the diversity in, for instance, gender categories and gender assignment to nouns across languages would be considered a minor detail that would be irrelevant to how people perceive objects and their properties.

With more indepth investigations of a broad range of languages, however, the differences between languages are now taken more seriously (Dabrowska, 2015; Evans & Levinson, 2009), and some of these differences are involved in coding and emphasizing relevant information in sensible, if even in languagespecific ways. For instance, when expressing motion by way of verbs, some languages (such as English, Russian, or Chinese) emphasize the manner of the movement over its path whereas others (including Spanish, Greek, or Japanese) emphasize path over manner. For illustration, compare the following two sentences which are borrowed from Papafragou and Selimis (2010, p.227, footnote 2):


In (1), the emphasis rests on the *manner* of motion (here: flying in contrast to, say, tripping), while in (2) it rests on the *path* of motion (exiting in contrast to, say, entering). In English, the manner of motion is expressed by the verb itself ("flying"), whereas path information is expressed by way of a preposition ("out of"). A roughly comparable statement in Greek (2) expresses information on the path of motion in the verb ("exiting") whereas the manner of motion would have to be explicated by way of an attribute (here as "flying") or actually in a second sentence with a new verb.

Such language-specific differences are also documented for other types of semantically meaningful categories (such as tense in verbs, or the distinction between countable objects and substances; see, Wolff & Holmes, 2011, for an overview). For this

reason, the controversial debate has shifted in recent years and is now focusing on Thesis 2: Does the specific way in which a language describes the world really affect the experiences of its speakers? In other words: Do speakers of English and of Greek *perceive* motions in distinct and different ways?

# 12.1.3 Do Different Descriptions of the World Affect Our Experiences?

The most radical position with regard to the relation of language and thought is the position of behaviorism, as represented by John Watson. Watson conceived of thought simply as inner speech—a position that soon turned out to be untenable. The position at the opposite end of the spectrum is advocated, for instance, by Noam Chomsky, the linguist who became famous in the 1950s for crushing behaviorist accounts of language. The view he made popular is that language and thought are two entirely separate and distinct modules (e.g., Chomsky, 1986), hence precluding, by definition, any potentiality of linguistic relativity (Pinker, 1994). Prevailing for decades, this view still has supporters, but is slowly losing ground. Specifically, developmental psychologists in the tradition of Lev Vygotsky and Jean Piaget have been arguing that, while cognition and language may emerge and develop independently from one another, they become entangled later on in a complex relationship. This view is further supported by empirical evidence that cognitive development more generally spurs on language development (overview in Harley, 2014).

The principle of linguistic relativity differs from both the behaviorist and the modular position in that it considers language and thought neither as identical nor as entirely separate. Nor do proponents of linguistic relativity dispute that thought is possible without language or that it (at least ideally) precedes language use. However, they emphasize more strongly than others the possibility that properties of the language one speaks may also affect aspects of how one thinks. Unfortunately, neither Sapir nor Whorf elaborated their ideas on linguistic relativity into a coherent theory. As one consequence, research in this field is plagued to this date by a plurality of possible readings. Our attempt

to systematize these readings follows the overview presented by Wolff and Holmes (2011).

One of the central dimensions on which possible readings differ is concerned with the question of whether language and thought are structurally parallel or different. The former case would support the position of linguistic determinism: In this case we would be able to engage only in those thoughts that our language permits. This most extreme form of linguistic relativity—frequently associated with Whorf even though he himself was rather ambivalent on this view (Whorf, 1956; and see Lee, 1996)—is of little intuitive plausibility and also refuted by empirical research (overview in Wolff & Holmes, 2011). This research demonstrated clearly that thought is more strongly guided by properties of the world than by linguistic labels (Imai & Gentner, 1997; Malt & Wolff, 2010).

Even if one accepts that language and thought may be structured in distinct ways, linguistic relativity could still unfold in one of several ways (Wolff & Holmes, 2011; and see Figure 12.1). Depending on the specific reading, these would assume an influence of language during


(language as *meddler* or as *augmenter*, respectively),

• *thinking after language*, that is, when linguistic effects linger on and thereby 'color' our thoughts in language-specific ways (language as *spotlight* and language as *inducer*).

In section 12.2, we explain in more detail the first and third reading (thinking before and after language) as they are closely associated, before turning to two examples of the second reading (thinking with language) in section 12.3.

# 12.2 Thinking before and after Language

# 12.2.1 Thinking before Language ("Thinking for Speaking")

Lexical items make differentiations possible, as for distinguishing between pastel green, moss green, and turquoise. Grammatical structures do not simply *afford* such differentiations but *require* them. When we put our thoughts into words—and even before we can begin doing this—a number of decisions need to be made. These include having relevant information available that need to be specified according to the grammar/grammatical rules of the language we intend to use. For illustration, take the categories of tense and aspect that in many languages tend to be realized in the verb. Both categories provide information on time, but they focus on different facets

Figure 12.1: The most plausible ways in which language may affect thinking (according to Wolff & Holmes, 2011), and the relation between them.

of time: *Tense* specifies the time in which an event takes place (e.g., in the past, present, or future), while *aspect* specifies how this event extends over time (e.g., whether it is ongoing versus terminated, or whether it is progressive versus habitual).

In German, each verb form always requires an instantiation of tense—for example, the present tense in (3) and the past tense in (4)—but largely disregards aspect.


In English, aspect needs to be specified in addition to tense, with (5) indicating habitual action, whereas (6) indicates progressive action:


In Tongan and other Polynesian languages, or in Chinese, neither tense nor aspect is expressed in the verb.

In order to form a grammatically correct sentence, speakers of German therefore need to specify *when* an event is happening, but need not specify whether it is or is not *ongoing*. In other words: The aim to express something in language forces one to pay attention to specific types of information while we may safely ignore others. This effect of language on thought, famously labeled "thinking for speaking" by Slobin (1996), emerges *before* language is actually used and can be observed in various domains (Gennari et al., 2002; Papafragou et al., 2008). A second example for this type is the distinct focus, described earlier (in section 12.1.2), that languages direct at either the manner or the path of motion (Papafragou & Selimis, 2010).

#### 12.2.2 Thinking after Language

If we accept an influence of thinking for speaking, it follows almost naturally that an entire life of occasions in which we need to verbalize our thoughts would form habits regarding what we pay attention to. These habits should have a certain likelihood to

linger on also in contexts in which respective information is not immediately required for verbalization, that is, during thinking without imminent need for speaking. In emphasizing some distinctions—say, with regard to time point or temporal course—more than others through compulsory grammatical categories (here: tense or aspect), languages would therefore still direct attention to the same types of information in a regular and sustained manner, like a spotlight (this is why Wolff & Holmes, 2011, dub this instance of thinking after language the *spotlight effect*). One of the instances Wolff and Holmes cite for this effect is the gender distinction mentioned earlier.

Yet, both in terms of theoretical plausibility and of empirical support, the category of grammatical gender has remained a rather controversial case. Semantic gender languages assign a gender only to living things that possess a biological gender (sex), whereas formal gender languages extend the gender distinction to all nouns regardless of whether their referents have a sex. English is an example of the former, German of the latter. The two languages alike assign masculine gender to living things like "man", "son", or "rooster", and feminine gender to "woman", "daughter", or "hen". But while almost all inanimate things are neuter in English, a large proportion of them are categorized as masculine or feminine in German (e.g., *der Mond* [themasc moon], *die Sonne* [thefem sun]). Hence, as a formal class, grammatical gender does not reflect genuine differences in the world. It serves the purely linguistic function to generate congruence within sentences, and particularly between the noun and the accompanying article and adjective, as in (7) and (8):


Notably, also, relatively few languages distinguish exactly between the two genders of interest, namely, masculine and feminine gender; many others categorize on different grounds, with some even conflating the two (such as Swedish or Dutch) and some distinguishing up to 20 different genders (Corbett, 1991).

Gender distinctions have still attracted considerable interest as a subject for studies on linguistic relativity (e.g., Boroditsky et al., 2003; Konishi, 1993). A popular measure in these studies is the *gender congruency effect* (see Chapter 11, "The Nature of Language"). It emerges if the grammatical gender of a noun (e.g., masculine in the case of *Mond*) is congruent with the association of the respective referent with a specific sex (here: moon as male). However, the more sophisticated the methods used for investigation (e.g., adopting implicit tasks instead of direct assessments), the more difficult it turned out to replicate the initially positive findings (overview in Bender et al., 2018).

For better suited and more convincing examples of the spotlight effect we therefore need to turn to domains in which linguistic categories do reflect and make salient—genuine characteristics of the world. Only then do they have the potential and the power to habitually redirect attention to these characteristics.

One such example is spatial referencing. A frame of reference (FoR) is a linguistic tool for describing relations between entities. It provides a coordinate system for locating a thing (say: a ball) in reference to another thing (say: a boy) and comes in three types (Levinson, 2003): The *absolute* FoR is aligned with external fixed points such as the cardinal directions or a river; the *intrinsic* FoR is aligned with the point of reference (here the boy); and the *relative* FoR is aligned with the perspective of an observer. Importantly, languages differ in which of these FoRs they can use or prefer, and this in turn affects people's wayfinding skills, their cospeech gestures, how they memorize relations and orders, or how they think about time (Bender & Beller, 2014; Levinson, 2003; Majid, Bowerman, Kita, Haun, & Levinson, 2004).

# 12.3 Thinking with Language

Besides the obvious case of *thinking for speaking* and the likely case of the *spotlight effect*, both of which arise from the need to focus on information requested by one's grammar, two more readings of linguistic relativity have been investigated quite extensively in the past decades: one focusing on the possibility that linguistic representations enter into conflict or interfere with non-linguistic representations (language as *meddler*), the other focusing on the possibility that linguistic representations support, augment, or even make possible non-linguistic representations (language as *augmenter*). In both of these cases, it is the role of language as a cognitive tool that opens up an influence of language on thought.

# 12.3.1 Language as Meddler: The Case of Color Perception

Color is an excellent example for investigating the influence of language on perception and other cognitive processes because colors can be exactly defined and measured in terms of the wavelength of light. The color terms we use to denote different colors verbalize a categorical system that we impose on the physically unstructured color spectrum, and hence are a product of thought. The interesting question now is whether these color terms, once established, also impact on thought. That is: If two languages divide the color spectrum in different ways, will the speakers of these languages also perceive the colors in different ways?

That languages indeed differ in how they divide the color spectrum has been well known for half a century (Berlin & Kay, 1969), systematically documented in the *World-Color-Survey*, a largescale research program at the University of California, Berkeley (http://www1.icsi.berkeley.edu/wcs/). This research program focuses on basic color terms words that are elementary, generally applied, and broadly understood. In order to qualify as elementary, a color term needs to be a single word; composed expressions like "dark red" or "forest green" are therefore excluded. A color term is considered general if it can be applied to any kind of object; a term like "blond" is hence excluded because its

usage is restricted to hair. Terms like "magenta" or "burgundy", finally, do not qualify because they are not widely known.

Following these specifications, English is considered to comprise eleven basic color terms: "black", "grey", and "white" for achromatic colors, and "red", "yellow", "green", "blue", "orange", "pink", "purple", and "brown" for chromatic colors (Berlin & Kay, 1969). Many languages have fewer basic color terms than English, but some languages also have more. For instance, English uses different terms for *green and blue*, whereas Welsh subsumes them under one term (Lazar-Meyn, 2004), and both Italian (Paggetti et al., 2016) and Russian (Davies & Corbett, 1994) distinguish *blue* further into a *light blue* and a *dark blue* (Table 12.1).

Do, therefore, speakers of Welsh, English, and Italian or Russian perceive the respective colors differently? This question can be investigated by selecting colors with equal intervals in their hue so that the selected colors in one language all fall into the same category (e.g., *blue*), while they are separated by a categorical boundary in the other language, as with *goluboj* versus *sinij* (see Figure 12.2a). If the linguistic categorization has an impact on perception, the difference between hues that are separated by the categorical boundary should be overestimated, compared to the identical difference between two hues that belong to the same category. In our example, this would be the case for speakers of Russian, but not English. In adopting this strategy, several studies could demonstrate that such a categorical boundary does not influence color perception per

se, but does influence other cognitive processes involved in similarity judgments, rapid distinctions between hues, learning of new color categories, or the recognition of hues (Kay & Kempton, 1984; Mitterer et al., 2009; Roberson et al., 2005).

One such study (Winawer et al., 2007) investigated whether a categorical boundary affects color discrimination in speakers of Russian compared with speakers of English, using stimuli from the color range denoted as "blue" in English (Figure 12.2a). Participants were shown squares in different shades of blue arranged in a triad and were asked which of the two on the bottom matches the one on the top (Figure 12.2b). In some trials, the non-identical hue at the bottom was from the same color category as the target hue; in others, it was from the complementary color category. It turned out that shades of blue were indeed easier to distinguish if they belonged to different categories than if they belonged to the same category. Interestingly, this holds particularly for stimuli projected onto the right visual field, which is connected to the left hemisphere in the brain (Gilbert et al., 2006). Brain areas involved in color perception and language processing are activated faster and more strongly by the distinction of colors that belong to different categories (Siok et al., 2009) and of colors that are easy to name (Tan et al., 2008).

Whether such effects of categorical boundaries originate from conscious processes has not been conclusively clarified. At first glance, it seems plausible that people use a naming strategy and that different naming of colors leads to differently remembered


Figure 12.2: Stimuli used in the study by Winawer and colleagues (2007): the different shades of blue (a) and an example of a stimuli triad (b). Participants were asked to pick one of the two squares from the lower row that matched the color of the single square above (©(2007) National Academy of Sciences, U.S.A.; permission to reprint granted).

colors (Mitterer et al., 2009). However, participants themselves reported that the colors they categorized as different actually *looked* different (Kay & Kempton, 1984, p.75). At least in this type of task, it seems, therefore, that what tips the scales is not the explicit verbal naming of the color but rather an automatic, non-conscious activation of categorical information.

The studies on color perception are the classic in the field of linguistic relativity and have contributed greatly to advancing the field in terms of both theoretical clarification and methodological elaboration. Still, the potential effects of the (color) lexicon should not be overrated. While the lexicon provides options for differentiation from which speakers can choose, these options can easily be complemented in case they turn out to be insufficient. If you deem neither "red" nor "brown" the appropriate label for the color of a chestnut, you can consider using "redbrown" or "chestnut-colored" or simply invent a new label such as "maroon". But the fact that color perception is strongly determined by biological and anatomic factors renders findings in this domain still significant. After all, color vision is based on the exact same mechanism in all members of our species (apart from those with color blindness); the photore-

ceptor cells responsible for human color vision are particularly sensitive for certain wavelengths of light and hence for certain color experiences. In other words: If verbally mediated differentiations are able to modify cognitive processes even in this fundamental domain, an even stronger influence might be expected in domains in which biological constraints are less pronounced.

# 12.3.2 Language as Augmenter: The Case of Numerical Cognition

Handling numbers is a key skill in modern daily lives. While mathematics is something that many people try to keep at arm's length, the ability to precisely assess the number of a set of items certainly strikes most as utterly simple. And yet, competently dealing with numbers is not at all natural. A biologically evolved precondition that we share with many other species is the ability to perceive quantity. This includes the ability to keep track of up to four distinct items by way of immediate perception (called *subitizing*) and to approximately estimate larger quantities (Feigenson et al., 2004). By contrast, the ability for exact quantification (i.e., accurately assessing, remembering, and reconstructing

numbers beyond the subitizing range) is uniquely human. It presupposes cultural mediation, specifically a cultural tool, and extensive training (Núñez, 2017). The prototypical tool essential for acquiring this competence is a conventionalized counting sequence: an ordered list of number representations (*numerals*), each of which refers to a clearly defined exact number (Wiese, 2003).

Not all natural languages used by humans comprise such counting sequences. Mundurukú, for instance, a language spoken in Amazonia, is counted among the few attestable cases in which numerals do occur, but lack precise numerical meaning. The fifth numeral *pug põgbi ˜* , for instance, does not mean *precisely* 5, but only *roughly* 5, and can refer to values from 4 up to 12, depending on context (Pica et al., 2004). Pirahã, another Amazonian language, is claimed to comprise no numerals at all (Everett, 2005). Psychological studies in these two Amazonian groups indicate that the lack of precise numerals impairs the ability to exactly memorize, recall, and match larger quantities (Frank et al., 2008; Gordon, 2004; Pica et al., 2004). Similar issues are also observed in home-signers. A home-sign is a rudimentary sign language typically developed by deaf children to hearing parents. As home-signs are created in the absence of linguistic input, they typically lack conventionalized and stable counting sequences (Spaepen et al., 2011). And even students of US American Ivy League universities experience the very same challenges in numerical tasks if they are prevented (e.g., by verbal interference) from actively using number words (Frank et al., 2012).

However, the potential for differences between languages is not confined to the presence or absence of counting sequences. Counting sequences themselves can also vary extensively in terms of their properties, which depend on the number and shape of the elements in a sequence, their order and relations, or the modality in which they are realized (Bender & Beller, 2012; Chrisomalis, 2010; Widom & Schlimm, 2012). No two number systems on this planet would therefore be exactly alike. Even a number as simple and small as 5 can be denoted in



*Sources*: Beller & Bender (2008), Frank et al. (2008), Pica et al. (2004), and Turner (1951).

fundamentally different ways (for some concrete examples, see Table 12.2): vaguely as "many", by an elementary word like "five", by a compound translating into "2 + 2 + 1", by five distinct notches on a stick (or four upright notches crossed by a transverse notch), or by presenting a hand with all fingers extended (or closed).

The most obvious property in which counting sequences can differ is the modality in which they are implemented: through objects such as tally sticks or knotted strings (in the case of *material* systems); through fingers and body parts more generally (in the case of *body-based* systems); through number words (in the case of *verbal* systems); and through written notation such as the Hindu-Arabic digits or the Roman numerals (in the case of *notational* systems). Other properties (illustrated below) involve the presence or absence of a base and perhaps a sub-base, the size of such bases, or the regularity and transparency of how larger number words are composed.

Crucially, these properties have cognitive implications, that is, they affect how numbers are represented and processed (Bender & Beller, 2017; Bender et al., 2015; Schlimm & Neth, 2008). The *Hindu-Arabic digits*, for instance, constitute a decimal system; digits from 1 through 9 are represented by distinct symbols, the base 10 and the powers to which it is raised (e.g., 100, 1000, etc.) are represented by position (this is why the principle is often called "place-value": the value of a number is codetermined by its place). The *Roman numerals*, by contrast, constitute a system that uses sub-base 5 in addition to base 10; basic numbers are largely represented in a cumulative manner (as I, II, III), whereas sub-base, base, and their powers are represented by distinct symbols (V, X, L, C, D, and M). Due to this cumulative representation of basic numbers instead of a place-value principle, it is actually easier to execute basic arithmetic operations such as addition or multiplication with the (original) Roman numerals than with Hindu-Arabic digits (Schlimm & Neth, 2008).

Let us illustrate this for the addition of 16 and 27. All additions require both declarative and procedural knowledge. *Declarative knowledge* in the Hindu-Arabic system includes the numerical value to which

a symbol refers as well as the sums of all relevant 100 single-digit addition facts. In other words: One need to know *beforehand* that the sum of 6 and 7 is 13, and that adding 1, 1, and 2 yields 4. *Procedural knowledge* includes, minimally, that numbers need to be written so that the smallest values (the most rightward number in each number representation) are aligned, that numbers need to be added by position, starting from the right, and what to do with carries:

$$\frac{2\text{ 6}}{1\text{ 7}}$$
 
$$\frac{117}{43}$$

Adding the very same numbers with Roman numerals (XXVI and XVII) also requires declarative and procedural knowledge. Here, however, the *declarative knowledge* only needs to include the order of the basic symbols (according to their value) and the simplification rules inherent in the counting sequence, such as IIIII → V and VV → X. *Procedural knowledge* gets by with a few very simple tricks: start by joining the symbols of the addends

#### X X V I X V I I

order them according to their values

#### X X X V V I I I

and then simplify, with V V → X:

#### X X X X I I I

As this example illustrates, the manner in which numbers are represented in each of the two systems has an impact on how numerical information is processed—some operations are just more straightforward with one type of representation than with another. This phenomenon is called representational effect (Zhang & Norman, 1995), and it emerges not only for notational systems but for number systems in general (Bender & Beller, 2012, 2018).

Another instance of this effect is that a system will be understood, learned, and mastered more easily if it is regularly structured and transparent. Compared

to number systems in East-Asian and Polynesian languages (Bender et al., 2015; Miura, 1987), the systems in many Indo-European languages including English are pretty irregular. The number words from 1 through 10 are distinct and arbitrary, as in all decimal verbal systems. Once base 10 is reached, starting a new counting cycle with regularly composed number words such as "ten-and-one", "tenand-two", etc. would reveal the base 10 structure. English, however, blurs this structure with its specific number words "eleven" and "twelve". Not even "thirteen" is recognizable as "ten-and-three", which is why only at "fourteen" may a novice begin to sense a recurrent pattern in the suffixed "-teen" (Bender & Beller, 2018). Moreover, the difference between numerals like "thirteen" and "thirty" hinges on a crucial distinction between -*teen* and -*ty*, both of which refer to the same number (10) though and should therefore actually be identical. As a consequence of these irregularities, English-speaking children take more time than Chinese-speaking children to learn their system and require more effort for grasping its decimal structure and the algorithms based on it (Fuson & Kwon, 1991; Miller et al., 1995).

#### 12.4 Language as Tool for Thought

As should have become clear by now, language is an important tool for thought, aiding the coding, categorization, and storing of information as well as processes of reasoning, decision making, and problem solving. As we know from classic experiments in psychology (see Chapter 9, "Problem Solving"), the properties of a tool and the habits acquired during its usage tend to affect how the tool is applied. One instance illustrating this influence is the phenomenon of *functional fixedness*: the tendency to use a tool in conventional ways even if a new problem requires a novel application (Duncker, 1935; Glucksberg & Danks, 1968). A second instance is the so-called *set effect* (or *Einstellung effect*): the tendency to stick to a procedure that has worked before even if the new problem requires a novel approach (Luchins & Luchins, 1959). Applied to the case of language, we distinguish in this last section two states of familiarity: the standard state of a familiar language serving as a well-known tool (sections 12.4.1), and the implications that arise from using a foreign language as an unfamiliar tool (sections 12.4.2). In contrast to the familiar tool, which reinforces our cognitive habits, the unfamiliar tool seems to reset these habits to some extent.

# 12.4.1 Familiar Tool: Thinking by Language

The first language we acquire is our native language or mother tongue, and this language is with us during major parts of cognitive development, while we learn to categorize the things we perceive, discover the world of numbers, or try to figure out solutions for reasoning tasks and decision problems. As noted by developmental psychologists in the tradition of Vygotsky and Piaget, language and thought become entangled in a complex relationship during this process. In other words: language itself is like a glue that keeps our non-domain-specific, cross-modular, propositional thoughts together, "not just in the sense that language is a necessary condition for us to entertain such thoughts but in the stronger sense that natural language representations are the bearers of those propositional thought-contents" (Carruthers, 2002).

An example of the crucial role of language is the emergence and development of a "theory of mind" in children, which seems to benefit greatly from linguistic support (Pyers & Senghas, 2009; de Villiers, 2007). Theory-of-mind abilities emerge in all normally developing human children; their onset, however, depends on the amount of mental-state talk in parent-child interactions. For instance, whereas in the Western world, reflections on others' mental states are a topic of widespread interest and conversation, numerous societies across the world appear to adopt a perspective according to which mental states are private and opaque (Luhrmann 2011). This reluctance to openly speculate about the feelings, intentions, or thoughts of others affects the ease with which children acquire an understanding of such notions (Vinden 1996; Wassmann et al. 2013).

A second example for illustrating the role of language for cognition in a more general sense is nu-

merical cognition, for here the invention of number words was indispensable for processes of counting and calculating (see section 12.3.2). In this case, specific linguistic representations are so essential for cognitive processing that they are considered a component of cognition itself. Such instances constitute cases of *extended cognition*, in which information is distributed to both mental and non-mental states and in which cognitive processing involves both types of information (Clark & Chalmers, 1998; Hutchins, 1995; Norman, 1993). For instance, the computation of 26 and 17 described earlier requires information on what each numerical symbol means or how to execute a column addition (stored mentally), but also relies on the presence of the numerals (stored on a piece of paper).

Explanations of why using language as a tool would affect thought follow a slightly different track, where perception and categorization are concerned. Explanations in this domain are based on the wellknown fact that information processing unfolds as an interplay between bottom-up processing of sensory signals and top-down predictions about what these signals might be. In this interplay, language plays a key role in that it serves as a main source for generating predictions. If these predictions happen to match the stimulus perceived, they help to discover things that would have otherwise been missed (Lupyan & Clark, 2015; Lupyan & Ward, 2013).

This approach is refined by Cibelli and colleagues (2016) for the controversial case of color perception. It takes its point of departure in the *category adjustment model* proposed by Huttenlocher and colleagues (1991), according to which we tend to use information from two different sources when we have to draw inferences under uncertainty. One source is a fine-grained representation of the perceived stimulus itself, the other source is a categorical system devoted to the organization of perceptions and memories. If, for instance, we try to recall the exact color of a stimulus, the two sources would be the color seen and the linguistic color category in which it falls. An influence of language on memory would here be diagnosed when the recalled color shade shifts in the direction of the prototypical shade of the respective color category. This should be the stronger, the less certain we are with regard to our

sensory impression, for instance because the stimulus perception itself was imprecise or because our memory of it is fading away.

It is exactly this correlation that Cibelli and colleagues (2016) observed, both in empirical studies in which they manipulated the time span between the presentation of the stimulus and the recall of the memory, and in computer simulations of data from cross-linguistic studies. Their account also provides an elegant explanation of why effects of linguistic relativity are not always reliably replicated—namely, when experimental designs enable relatively high degrees of certainty in participants' perception or memory. Finally, this model also allows accounting for influences of language on cognition while at the same time supposing a universal foundation for cognition.

In the two instances described in sections 12.3.1 and 12.3.2, language is actively used as an aid to coding, storing, and reasoning: the color terms provided by language as a tool for identifying and memorizing color, and number words as a tool for counting and calculating. In these cases, language directly affects cognitive processes, either because the linguistic representations enter into conflict with non-linguistic representations (*language as meddler*) or because the linguistic representations support, augment, or even make possible the non-linguistic representations (*language as augmenter*). Typically, this kind of *online* influence diminishes when participants are prevented from making use of language, for instance, by way of a verbal interference task (e.g., Frank et al., 2012; Roberson & Davidoff, 2000). The same holds, of course, for instances of *thinking for speaking*, as in the absence of a need for speaking the effect will not arise. Instances of *thinking after language* are different. Here, the language-inherent need to pay attention to some information more than other information has led to a form of habituation that renders grammatically relevant aspects salient (*spotlight effect*) even without immediate involvement of language. An indirect or *offline* influence of language like this is less likely suppressed by verbal interference.



# 12.4.2 Unfamiliar Tool: Thinking in a Foreign Language

Speaking a second language has implications for how one thinks. While habituated patterns of thought typically develop in line with the dominant language, bilinguals seem to switch between patterns of thought, rather than transferring the pattern from their dominant to the non-dominant language (Kousta et al., 2008). In fact, learning a new language with novel grammatical categories appears to entail a cognitive restructuring in the bilingual mind (Athanasopoulos, 2007). But using a second language *while* thinking may also have more general effects on the outcome of the thinking process.

Keysar and colleagues (2012) first described what has since been called the foreign language effect: When their participants worked on a set of classic decision tasks in a foreign language, their decisions differed significantly from those observed with the same type of problems in their native language. A robust finding in this research field is, for instance, that the decisions we make depend upon framing (Tversky & Kahneman, 1981): We avoid risks if the task is framed positively (as something we can gain), but are risk-seeking if the—actually identical—result is framed negatively (as a loss), as in the case of the "Asian disease" task (Table 12.3).

The participants in the study by Keysar and colleagues (2012) exhibited the well-known pattern when working on the task in their native language. When working on it in a foreign language, however, the extent to which they opted for the safe versus risky option was independent of the framing.

A series of studies has now documented such a foreign language effect for various tasks and contexts, including gambling, mental book-keeping, risk awareness, or moral judgments (overview in Hayakawa et al., 2016). In moral dilemmas, for instance, people using a foreign language are more inclined to make utilitarian decisions by weighing the result more strongly than the means or intentions that lead to it (Geipel et al., 2016). When confronted with the (hypothetical) dilemma of sacrificing one human life to save five others, participants find it more acceptable to do so if they only have to hit a switch (thereby diverting a trolley so that it runs over a single person instead of five people) than if they were to actively push the single person from a bridge (thereby bringing the trolley to a halt and preventing it from running over the five people). The outcome is the same in both cases (five lives saved at the cost of one), but the reluctance is much greater in the second case normally. If, by contrast, the dilemma is presented in a foreign language, the greater good outweighs the moral rule of not inflicting damage on another person, and pushing the single person appears much more acceptable (Costa et al., 2014; Geipel et al., 2015).

The exact mechanism underlying such effects of foreign language usage is not yet clear. Keysar and colleagues (2012) interpret their findings as evidence for the assumption that the cognitive processing in the foreign language is accompanied by a greater psychological distance and is not anchored emotionally to the same extent as is the case for processing in the native language (see also Hayakawa et al., 2016; Pavlenko, 2012). This would also explain why swearwords appear less insulting, declarations of love less romantic, and books less exciting in a foreign language (Caldwell-Harris, 2015).

# 12.4.3 Conclusion

For several decades, the principle of linguistic relativity was disregarded as a topic of interest in the cognitive sciences, largely due to Chomsky's influence. Reintroduced as a topic worthy of scientific investigation in the 1990ies (Gumperz & Levinson, 1996; Lee, 1996; Lucy, 1992a, 1992b, 1997), it is today one of the most thriving and thrilling fields in cognitive science (e.g., Boroditsky, 2001; Dolscheid et al., 2013; Gentner & Goldin-Meadow, 2003; Haun et al., 2011). As mentioned in the introduction, the discussion is still controversial, but evidence in support of at least some versions of linguistic relativity is accumulating. The same is true for theoretical attempts to reconcile the idea that cognition may be susceptible to influences of language on the one hand with one of the key assumptions of cognitive science, the universality of cognitive processes, on the other (e.g., Cibelli et al., 2016; Lupyan & Clark, 2015).

Language provides structure that leads us to pay more attention to some information than to other information; it provides categorical systems that are

used to adjust uncertain assessments, and it provides conceptual bricks that help scaffold cognitive skills. Still, we are not at the mercy of these tools—if they cease to serve their purpose or to achieve their goal, we are able and apt to adjust them, for instance by simply inventing new color terms or increasing the range of number words needed for counting (Beller & Bender, 2008). It is exactly for this reason that humans in the history of their species were able to attain ever greater goals with increasingly well suited tools (Miller & Paredes, 1996). This also holds for language as our most important tool for thought.

# Acknowledgment

This chapter is based in large parts on collaborative work with my former colleague and partner, the late Sieghard Beller, to whom I remain indebted for inspiration, fruitful exchange of ideas, and critical feedback. I also wish to thank Stina Hellesund Heggholmen for proofreading and helpful comments on an earlier draft as well as Julia Karl for her terrific work with the typesetting.

# Summary

According to the principle of linguistic relativity, most prominently proposed by Whorf, the language we speak affects the way we think. Three theses are central to this account: that languages differ in how they describe the world; that the way in which a language describes the world affects the experiences had by its speakers; and that speakers of different languages therefore have different experiences. The underlying idea is still controversial in parts of cognitive science, but evidence is accumulating in support of its three most plausible readings, namely that language may affect thought in terms of thinking *before* language (as *thinking for speaking*), *with* language (as *meddler* or as *augmenter*), and *after* language (as *spotlight*). In this chapter, we summarize research on four domains, to illustrate arguments and approaches in the field. In order to raise awareness for critical issues, we begin with grammatical gender, originally claimed as an instance of the spotlight effect, but used here as a counter-example. More convincing instances are spatial references (for the spotlight effect), the influence of the color lexicon on color categorization (language as meddler), and the role of number words for numerical cognition (language as augmenter). In conclusion, we elaborate on the role of language as a tool for thought, including the differences that occur when using a foreign language while thinking.

#### Review Questions


### Hot Topic: Is Grammatical Gender an Instance of Linguistic Relativity?

The relationship between culture, language, and cognition, as well as their (co-)evolution, has fascinated me since the beginning of my academic career when I was working as a cultural anthropologist, and it constitutes the main area of my research in cognitive science and psychology today. My interests include number representations and their cognitive implications, spatial and temporal references, the evolution and cultural constitution of causal cognition, and the possible influence of linguistic categories on thought (known as *linguistic relativity*).

Andrea Bender

A topic that has been controversially debated for decades is whether grammatical gender qualifies as an instance of linguistic relativity. In languages

with a formal gender system, all nouns are assigned to one of several classes that determine the declension of associated words. For instance, the moon has masculine gender in German (*der Mond*), whereas the sun has feminine gender (*die Sonne*). Is, therefore, the sun conceived as more feminine than the moon by German speakers? One indicator for such an influence is the "gender congruency effect". It emerges if the grammatical gender of a noun (masculine for *Mond*) is congruent with the association of its referent with a specific sex (here: as male).

In previous research, participants were often directly asked for such associations. A major issue with explicit measures like this is that information on grammatical gender can be actively used to aid the decision. In our own work with speakers of German, we therefore used an implicit measure. Participants were asked to categorize nouns according to criteria not obviously related to gender associations. Critically, the stimuli themselves constituted either congruent or incongruent cases; faster and/or more accurate responses in the congruent than the incongruent cases would then attest to a gender congruency effect. We examined nouns for which grammatical gender and biological sex were congruent or incongruent (Bender, Beller, & Klauer, 2016a), for which grammatical gender and allegorical association were congruent or incongruent (Bender, Beller, & Klauer, 2016b), or for which grammatical gender was related to sex (masculine/feminine) or not related to sex (neuter) (Bender, Beller, & Klauer, 2018). Across these studies, a gender congruency effect emerged for all those nouns that had strong male or female connotations, almost regardless of

their gender, suggesting that the semantic association of the nouns has a much stronger effect than their grammatical gender.

#### References


Bender, A., Beller, S., & Klauer, K. C. (2018). Gender congruency from a neutral point of view: The roles of gender classes and gender-indicating articles. *Journal of Experimental Psychology: Learning, Memory, and Cognition*, *44*, 1580–1608. doi:10.1037/xlm0000534

# References


*ematical cognition and learning: Language and culture in mathematical cognition* (pp. 297–320). Cambridge, MA: Academic Press. doi:10.1016/b978-0-12- 812574-8.00013-4


Implications for everyday life. *Current Directions in Psychological Science*, *24*, 214–219. doi:10.1177/0963721414566268


*and Verbal Behavior, 7*, 72–76. doi:10.1016/s0022- 5371(68)80166-5


mantischen Gehalt? *Psychologische Rundschau*, *58*, 171–182. doi:10.1026/0033-3042.58.3.171



nian indigene group. *Science*, *306*, 499–503. doi:10.1126/science.1102085



Glossary Bender

# Glossary


are influenced in their perceptions and categorizations by the linguistic structures implicit in their mother tongue. 214


# Chapter 13

# Expertise

#### DAVID Z. HAMBRICK

Michigan State University

People are capable of remarkable feats. Examples range from the everyday—such as the waiter who can remember a dozen orders without writing them down—to the esoteric—such as the chess master who simultaneously plays (and beats) dozens of opponents while blindfolded—to the epic—such as Bob Beamon's belief-defying long jump of over 29 feet in the 1968 Mexico City Olympics.

What sets elite performers apart from everyone else? Invariably, they have a history of training in their domain. This is true even of people who progress extremely rapidly. For example, the Norwegian chess great Magnus Carlsen took around 5 years of serious involvement in chess to attain grandmaster status (Gobet & Ereku, 2014). Simply put, there are no "instant" experts.

As a scientific concept, expertise may be defined as a person's current level of performance in a complex task. This could be a hobby, such as playing a musical instrument, or a sport, or an occupational task, such as diagnosing a patient. It could also be an everyday task, such as recognizing faces or driving. A major unanswered question in research on expertise is the extent to which performers' history of training in a domain account for individual differences in expertise (i.e., differences across people in domain-specific performance). For example, is it the amount of intensity of training alone that distinguishes Serena Williams from her highly skilled, but less successful, competition on the Women's Tennis Association Tour?

This chapter reviews evidence concerning this question and is divided into four sections. The first section provides a brief history of research on expertise, from prehistory to present. The second section focuses on theoretical debates in contemporary expertise research, and particularly the role of training history in explaining individual differences in expertise. The third section describes a multifactorial perspective on expertise, and the final section discusses directions for future research.

# 13.1 The Science of Expertise: A Brief History

There is no denying that some people acquire complex skills much more rapidly, and reach a much higher level of ultimate performance, than other people. Consider the American golfer Babe Didrikson Zaharias, pictured in Figure 13.1. An extraordinary athlete, Zaharias was an All-American basketball player in high school, and went on to win gold medals in the hurdles and javelin in the 1932 Los Angeles Olympics (van Natta, 2013), equaling her world record in the former. Reports of when Zaharias began playing golf vary. According to legend, she shot a respectable 91 the first time she ever played golf. This is almost certainly not true; as a *Sports Illustrated* profile noted, "In truth she had played a great deal of golf, beginning as a high school student in Beaumont and continuing in Dallas, where she often hit 1,000 balls a day" (Babe,

Hambrick Expertise

1975). Nevertheless, it is clear that Zaharias' ascent to golfing greatness was rapid. Her first significant victory came in 1935 at the Texas Women's Amateur, and only five years later, she won a major championship, the Western Women's Open. She went on to become one of the best golfers in history, winning 41 professional tournaments, including 10 major championships. In 1951, she was inducted into the World Golf Hall of Fame (Babe, 1975; van Natta, 2013).

Figure 13.1: Babe Didrikson Zaharias.

Millions of people play golf, but only a handful have played it as well as Zaharias did. Why is this so? What characteristics did Zaharias possess that set her apart from nearly everyone else who has ever played the game? And did she acquire all those characteristics through training? More generally, what underlies individual differences in expertise? To provide context for the contemporary debate surrounding this question, let's begin with a brief history of scientific research on expertise.

#### 13.1.1 Prehistory to Antiquity

The term *expertise* did not come into common usage in the English language until the 1950s (Hambrick & Campitelli, 2018). However, there is no reason to

doubt that even early humans differed in their skill in complex tasks. Presumably, some prehistoric people were more skilled than others at producing and using tools, painting on cave walls, and other tasks of prehistoric life. What did these people think about the origins of these differences? It is impossible to know—by definition, prehistory is the period before written records—but they likely attributed them to supernatural forces. We do get a sense from prehistoric art that early humans were just as captivated by displays of skill as we are today. Paintings from the paleolithic era in the Lascaux cave in France estimated to be 20,000 years old include images of wrestlers and sprinters, and in the Cave of Swimmers in present day Egypt, depictions of archers and swimmers date to 6,000 B.C.E.

Many millennia later, the Ancient Greeks laid the foundation for the contemporary debate over the origins of expertise. In *The Republic* (ca. 380 B.C.E.), Plato made the *innatist* argument that "no two persons are born alike but each differs from the other in individual endowments." Aristotle countered with the *empiricist* argument that experience is the ultimate source of knowledge (Stanford Encyclopedia of Psychology, 2015). More than two thousand years later, in the mid-19th century, these contrasting philosophical views would frame the scientific debate over the origins of expertise in the new field of psychology. The debate has raged on ever since.

### 13.1.2 The Classical Era

Born in 1822 into a prominent family of British scientists, Francis Galton was a polymath—a person with wide-ranging learning and knowledge. Over the course of his long career, he published hundreds of scholarly articles, on topics as varied as sociology, geography, anthropology, meteorology, psychology, and statistics (Gillham, 2001). Galton also popularized what is undoubtedly the most often repeated phrase in the social and behavioral sciences: *nature and nurture* (Fancher, 1979).<sup>1</sup> "Nature is all that

<sup>1</sup> Galton is often credited with coining (originating) the phrase "nature and nurture", but the juxtaposition predates him by centuries (see Fancher, 1979). In his 1582 pedagogical guide *Elementarie*, Richard Mulcaster observed, "Nature makes the boy toward, a man brings with himself into the world; nurture is every influence without that affects him after his birth", he wrote in *English Men of Science: Their Nature and Nurture* (1874).

In 1859, Galton's half-cousin Charles Darwin had published *On the Origin of Species*, laying out his theory of evolution. In a nutshell, Darwin's thesis was that the distinctive features of a species whether the length of giraffe's neck or the the peacock's brilliant plumage—emerge through a process of *natural selection* whereby traits that help the species survive and reproduce in their habitat are passed from parents to offspring. Galton believed that natural selection operates on human abilities, too. As he wrote in his book *Hereditary Genius*, "a man's natural abilities are derived by inheritance, under exactly the same limitations as are the form and physical features of the whole organic world" (Galton, 1869, p. 1). To make his case, using biographical dictionaries, Galton identified nearly a thousand "men of reputation"—people who had made eminent contributions in various fields, such as Wolfgang Amadeus Mozart, Isaac Newton, and Napoleon Bonaparte. By analyzing their family trees, he then documented that these people represented just 300 families, suggesting that biological relatedness had something to do with their success. For example, he noted that the "Bachs were a musical family, comprising a vast number of individuals, and extending through eight generations. . . .There are far more than twenty *eminent* musicians among the Bachs" (p. 240). Galton concluded that eminence arises from "natural ability" and went so far as to conclude that "social hindrances cannot impede men of high ability, from becoming eminent [and] social advantages are incompetent to give that status, to a man of moderate ability" (p. 41). For Galton, greatness overwhelmingly reflected nature.

Darwin was effusive in his praise for *Hereditary Genius*. "I do not think I ever in all my life read anything more interesting and original", he wrote to Galton in a letter dated December 23rd [1869]. Others were less enthusiastic. One reviewer took issue with

Galton's definition of eminence, complaining that one family of lawyers that Galton had included in his analysis "possessed a most extraordinary hereditary genius—*for getting on at the bar*" (Hereditary Talent, 1870, p. 119). Another reviewer, writing in the *British Quarterly Review* (1870), dismissed Galton as a "Darwinite"—an intended insult Galton almost certainly took as a compliment—and chastised him for oversimplifying genius. More substantively, based on results of his own study of the backgrounds of eminent scientists, the Swiss botanist Alphonse Pyrame de Candolle (1873) argued that Galton had drastically underestimated the role of favorable environmental circumstances (*causes favorable*) in achieving greatness. He noted, for example, that Switzerland had produced 10% of the scientists in his sample despise representing just 1% of the European population (Fancher, 1983).

Decades later, the learning theorist Edward Thorndike (1912) entered the fray, observing that "when one sets oneself zealously to improve any ability, the amount gained is astonishing" (p. 108), and adding that "we stay far below our own possibilities in almost everything we do. . . not because proper practice would not improve us further, but because we do not take the training or because we take it with too little zeal." (p. 108). Taking a more extreme stance, John Watson (1930), the founder of behaviorism, famously wrote:

Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I'll guarantee to take any one at random and train him to become any type of specialist I might select—doctor, lawyer, artist, merchantchief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors. (p. 104)

The pendulum had swung from nature—the view that heredity places strict limits on what a person can achieve—to nurture—the view that there are

nurture sees him forward" (Teigen, 1984). And in Shakespeare's *The Tempest*, Prospero describes Caliban as "A devil, a born devil, on whose nature / Nurture can never stick."

essentially no limits to what a person can achieve under the right circumstances.

# 13.1.3 The Modern Era

In the 1930s, the behaviorist mantle was picked up by B. F. Skinner. Skinner rejected as unscientific any notion of mental constructs—the *mind*—in psychological theorizing (Skinner, 1938). He believed that the science of psychology must focus only on what could be objectively observed: environmental stimuli and behavioral responses. Skinner's "S-R psychology" had a monumental influence on psychological research. By the 1950s, however, there was growing dissatisfaction with behaviorism as an approach to answering important questions in psychology, such as how we humans acquire our marvelous capacity to use language (Fancher & Rutherford, A.Rutherford, 2012; Gardner, 1985). In a critique of Skinner's book *Verbal Behavior* (1957), which attempted to explain language in purely S-R terms, the linguist Noam Chomsky (1959) commented that the "magnitude of the failure of this attempt to account for verbal behavior serves as a kind of measure of the importance of the factors omitted from consideration" (p. 28). Around the same time, computer science emerged as an academic discipline. The digital processing device—the computer—provided psychologists with a powerful new metaphor for conceptualizing human thought and behavior. Rather than being seen only in terms of S-R relationships, behavior could now be seen as the product of mental operations carried out on information. The cognitive revolution was underway.

A pioneer of this new paradigm was the Dutch psychologist Adriaan de Groot (1946/1965). An international chess master who twice represented the Netherlands in the Chess Olympiad, for his dissertation research de Groot endeavored "*to carry out an experimentally based psychological analysis of chess thinking*" (p. 13). To this end, he recruited chess players representing a wide range of skill from grandmaster to master to less skilled—and had them perform "choice-of-move" problems in which they were given game positions and asked to verbalize their thoughts (to "think out loud") as they deliberated on what move to make. de Groot found that

the grandmasters were no different than less skilled players in how many moves ahead they thought. Instead, he found that the grandmaster "immediately 'sees' the core of the problem in the position, whereas the expert player finds it with difficulty or misses it completely..." (p. 320). de Groot also had chess players representing different levels of skill briefly view chess positions and then attempt to reconstruct the positions by placing pieces on an empty board. de Groot found a large advantage of chess skill in recall: the grandmaster and master averaged over 90% correct, the expert only about 70%, and the weakest player just over 50%.

Inspired by de Groot's research, beginning in the 1970s the Carnegie Mellon University scientists William Chase and Herbert Simon conducted a series of studies on chess expertise (Chase & Simon, 1973). (Simon, incidentally, was another polymath: in 1978, he won the Nobel Prize in economics for his concept of bounded rationality.) Replicating de Groot's (1946/1965) study using more controlled procedures, Chase and Simon began by showing participants representing three levels of chess skill novice, intermediate, and master—arrangements of chess positions that were either plausible game positions or random, and then had the participants attempt to recreate the arrangements from memory by placing chess pieces on a board. Chase and Simon found that chess skill facilitated recall of the game positions but not the random positions, and therefore concluded that the primary factor underlying chess skill is a large "vocabulary" of game positions that automatically elicit candidate moves. More generally, they concluded that although "there clearly must be a set of specific aptitudes...that together comprise a talent for chess, individual differences in such aptitudes are largely overshadowed by immense differences in chess experience. Hence, the overriding factor in chess skill is practice" (Chase & Simon, 1973, p. 279).

A research movement—the Carnegie Mellon School—emerged around Chase and Simon's work. In the spirit of Watson (1930), the main argument of this movement was that nurture prevails over nature in expert performance: the "software" of the cognitive system—acquired knowledge structures—rather than the "hardware"—genetically-influenced abilities and capacities—underlies skilled performance. In one dramatic demonstration of this point, Ericsson, Chase, and Faloon (1980) reported a case study of a college student (S.F.), who after more than 230 hours of practice in the lab increased the number of random digits he could recall by a factor of ten, from a typical 7 to 79 digits. Verbal reports revealed that S.F., an accomplished track runner, recoded 3- and 4-digit sequences as running times, ages, or dates, and developed a strategy for encoding the groupings into long-term memory *retrieval structures*. Ericsson et al. concluded that there is "seemingly no limit to improvement in memory skill with practice" (p. 1182; the current record for digit memorization, set by Lance Tschirhart at the 2015 World Memory Championships, is a bewildering 456 digits.) In another fascinating study, Ericsson and Polson (1988) studied a waiter (J. C.) who could remember up to 20 dinner orders without writing them down using a mnemonic system.

The movement gained momentum in the early 1990s with publication of the article that is now the most highly cited article in the expertise literature (to date, the article has been cited nearly 10,000 times). K. Anders Ericsson and his colleagues (Ericsson, Krampe, & Tesch-Römer, 1993) proposed that individual differences in performance in domains such as music, chess, and sports largely reflect differences in the amount of time people have spent engaging in deliberate practice. Reminiscent of Thorndike's (1912) idea of "proper practice", Ericsson et al. defined deliberate practice as engaging in structured training activities that have been specifically designed to improve performance in a domain. To test this idea, Ericsson and colleagues reported results of two studies showing that elite musicians (violinists and pianists) had accumulated thousands of hours more deliberate practice than less accomplished counterparts.

Applying their framework to several domains, Ericsson et al. (1993) concluded that "high levels of deliberate practice are necessary to attain expert level performance" (p. 392), and in the next sentence added:

Our theoretical framework can also provide a sufficient account of the major facts about the

nature and scarcity of exceptional performance. Our account does not depend on scarcity of innate ability (talent). . . .We attribute the dramatic differences in performance between experts and amateurs—novices to similarly large differences in the recorded amounts of deliberate practice (p. 392).

For the next two decades, the deliberate practice view was the dominant theoretical perspective on human expertise.

# 13.2 Testing the Deliberate Practice View

The research movement that de Groot set in motion, Chase and Simon cultivated, and Ericsson and colleagues advanced has had a tremendous impact not only on scientific thinking about the origins of expertise, but on the lay public's understanding of the topic. Particularly over the past decade, there has been an explosion of popular interest in expertise. In his bestselling book *Outliers: The Story of Success*, the writer Malcolm Gladwell described Ericsson and colleagues' research on musicians and quipped that 10,000 hours is the "magic number of true expertise" (p. 40). The "10,000 hour rule" was, in turn, the inspiration for Macklemore and Ryan Lewis's rap song by the same title, which was used as the theme music for a Dr. Pepper soft drink commercial. Other popular books that have featured findings from Ericsson and colleagues' research include *Bounce: The Myth of Talent and the Power of Practice* (Syed, 2010), *Talent is Overrated: What Really Separates World-Class Performers from Everybody Else* (Colvin, 2010), *The Talent Code: Greatness Isn't Born, It's Grown. Here's How* (Coyle, 2009), and *The Genius in All of Us* (Shenk, 2010). In their own popular book, *Peak: Secrets from the New Science of Expertise*, Ericsson and Pool (2016) stated, "There is no reason not to follow your dream. Deliberate practice can open the door to a world of possibilities that you may have been convinced were out of reach. Open that door" (p. 179).

Nevertheless, Ericsson and colleagues' view has been highly controversial in the scientific literature

from the start (see Hambrick et al., 2016, for a discussion). The major criticism is that Ericsson and colleagues have overstated the importance of deliberate practice (for a sample of critiques, see Ackerman, 2014; Anderson, 2000; Gagné, 2013; Gardner, 1985; Marcus, 2012; Schneider, 1998, 2015; Tucker & Collins, 2012; Winner, 1996). The critical question is whether the deliberate practice view is supported by evidence. A theory is scientific insofar as it generates testable predictions: propositions that can be evaluated by collecting and analyzing data. A central claim of the deliberate practice view is that "individual differences in ultimate performance can *largely be accounted for* by differential amounts of past and current levels of practice" (Ericsson et al., 1993, p. 392, emphasis added).

In any straightforward sense of the word *largely*, this claim leads to the prediction that deliberate practice should, at the very least, account for *the majority* of the between-person differences in expertise. Does it? The available evidence indicates no. My colleagues and I reanalyzed the results of studies from two of the most popular domains for expertise research: chess and music (Hambrick, Oswald, Altmann, Meinz, Gobet, & Campitelli, 2014). On

average, after correcting for the unreliability of the measures<sup>2</sup> , deliberate practice accounted for 34% of the between-person variance in chess expertise and 30% of the between-person variance in music expertise, leaving the rest of the variance potentially explainable by factors other than deliberate practice. A meta-analysis focusing on music by another group of researchers (Platz, Kopiez, Lehmann, & Wolf, 2014) revealed similar results: deliberate practice explained 37% of the reliable variance in music performance (see Figure 13.2). Subsequently, my colleagues and I performed a meta-analysis of the relationship between deliberate practice and performance in five domains: music, games, sports, education, and professions (Macnamara, Hambrick, & Oswald, 2014). In each domain, deliberate practice left more of the variance in performance unexplained than it explained, even assuming liberal corrections for the unreliability of the measures.

In practical terms, this evidence implies that people may require vastly different amounts of deliberate practice to reach a given level of expertise. This point can be illustrated with results of a study of chess skill by the cognitive psychologists Guillermo Campitelli and Fernand Gobet (Gobet & Campitelli,

Figure 13.2: Results of Platz, Kopiez, Lehmann, and Wolf's (2014) meta-analysis of the deliberate practice-music performance relationship. The pie chart represents the total reliable variance in music performance (i.e., avg. corrected *r* = .61<sup>2</sup> x 100 = 37%). The light gray slice represents the amount of reliable variance explained by deliberate practice; the dark gray slice represents the amount not explained by deliberate practice. The meta-analysis included 14 studies.

<sup>2</sup> The reliability of a measure, which is an index of how much random measurement error it contains, limits the degree to which that measure can correlate with any other measure.

2007; Campitelli & Gobet, 2011). Recruiting their participants from a Buenos Aries chess club, they had chess players provide estimates of the amount of time they had spent on deliberate practice for chess and report their official chess rating. As expected, there was a positive correlation between deliberate practice and chess rating; the higher-rated players reported having accumulated more deliberate practice than the lower-rated players. However, the correlation was only moderate in magnitude (*r* = .42), indicating that some players required much more deliberate practice to reach a given level of skill than other players did. Indeed, the amount of deliberate practice requited to reach "master" status ranged from 3,016 hours to 23,608 hours—a difference of nearly a factor of 8. Furthermore, some players had accumulated more than 25,000 hours of deliberate practice without reaching the master level.

A further illustration of this point comes from a study in which children were trained to identify musical pitches. Sakakibara (2014) enrolled children from a private Japanese music school in a training program designed to train absolute (or "perfect") pitch—the ability to name the pitch of a tone without hearing another tone for reference. Nearly all

the children (22 of 24) completed the training and reached the criterion (the drop-outs were for reasons unrelated to the training). Based on these findings, Ericsson and Pool (2016) argued that "perfect pitch is not the gift, but, rather, *the ability to develop perfect pitch* is the gift—and, as nearly as we can tell, pretty much everyone is born with that gift" (xvi). Clearly, no one is born with a "prepackaged" ability to identical musical pitches; some exposure to music is required to acquire this skill. However, based on Sakakibara's findings, Ericsson and Pool's claim that "pretty much anyone" is born with the ability to develop this skill is unjustified because the children in the study were not representative of the general population—they were pupils in a private music school and may have been high on music aptitude, among other factors. It is also not clear that the children exhibited perfect pitch, because the criterion test assessed children's ability to identify a limited number of pitches. Finally, while the findings do demonstrate that it is possible to teach people how to identify musical pitches, there was a large amount of variability in the amount of time it took them to complete the training: from around 2 years to *8 years* (see Figure 13.3). Thus, there

Figure 13.3: Histogram depicting time to completion of pitch identification training in Sakakibara's (2014) study (*N* = 22).

would appear to be factors that interact with training to influence acquisition of this skill.

Taken together, the available evidence suggests that deliberate practice is not as important as a predictor of individual differences in expertise as Ericsson and colleagues originally argued. Ericsson has responded to this theoretical challenge with a vigorous defense of his view (Ericsson, 2014; Ericsson, 2016). However, his defense has been undermined by repeated contradictions, inconsistencies, and material errors in his arguments (see Hambrick et al., 2014; Hambrick et al., 2016; Macnamara, Hambrick, & Moreau, 2016). Most notably, Ericsson's definition of deliberate practice and his criteria for determining whether an activity qualifies as deliberate practice have shifted, making it difficult to test claims about the importance of deliberate practice (see Macnamara et al., 2018, for a discussion). For a theory to remain scientifically viable, theoretical terms must be used in consistent ways.

Two limitations of past research on deliberate practice should be noted, as well. The first is that Ericsson and colleagues have built the case for their view almost entirely on correlational evidence—that is, the finding of positive correlations between deliberate practice and performance from cross-sectional studies in which people representing different levels of skill estimate their past engagement in deliberate practice. The problem with this is that people may differ in accumulated amount of deliberate practice *because they differ in aptitude (or talent) for the domain*. As Sternberg (1996) noted, "deliberate practice may be correlated with success because it is a proxy for ability: We stop doing what we do not do well and feel unrewarded for" (p. 350). And as Winner (2000) added,

Hard work and innate ability have not been unconfounded. Those children who have the most ability are also likely to be those who are most interested in a particular activity, who begin to work at that activity at an early age, and who work the hardest at it. Ericsson's research demonstrated the importance of hard work but did not rule out the role of innate ability. (p. 160)

Responding to this point, Ericsson argued that "[d]eliberate practice does not involve a mere ex-

ecution or repetition of already attained skills but repeated attempts to reach beyond one's current level which is associated with frequent failures" (Ericsson, 2007, p. 18). Ericsson's argument seems to be that, because deliberate practice is not simply "more of the same" but rather is designed to push a person's performance to new heights, there should be no relationship between past performance in a domain and engagement in deliberate practice. This claim has the appearance of being a logical argument—but it is not. It is also implausible. What seems more likely is that compared to a person who has experienced little success in a domain, a person who has experienced a great deal of success will be more likely to engage in an activity to elevate their performance, for the simple reason that they are more likely to have some reason to do so. To illustrate, imagine two high school basketball players. One is among the best players in the state and is a top prospect for a college scholarship; the other is the worst player on his team—a "benchwarmer." Who seems more likely to engage in a grueling regimen of deliberate practice to elevate his current level of performance—the superstar or the benchwarmer?

The second limitation of past research on deliberate practice is that nearly all of the studies of the relationship between deliberate practice and performance—beginning with Ericsson et al.'s (1993) study of musicians—have relied on retrospective self-reports to assess deliberate practice. That is, people are asked to estimate how much they have practiced in the past. To be sure, some procedures (e.g., structured interviews) may yield more accurate estimates than other procedures (e.g., brief questionnaires). However, no retrospective method can ensure perfectly accurate retrospective estimates of practice. (Imagine being asked to estimate how much time you spent practicing the piano or a sport when you were 10 years old. Could you do so with much confidence?) Furthermore, rather than relying on their memory to generate practice estimates, people may base their practice estimates on their current skill level, and their beliefs about the importance of practice may influence their estimates. For example, a person who believes that practice is the most important factor in developing expertise may overestimate their past engagement in practice, whereas a

person who believes that talent is the most important factor may underestimate their past engagement in practice. The degree to which these biases influence estimates of the correlation between deliberate practice and performance is unknown. The relationship between deliberate practice and performance could be stronger than current estimates indicate, but it could just as well be weaker.

# 13.3 Beyond the Deliberate Practice View

To sum up, Ericsson and colleagues' deliberate practice view is not supported by the available evidence: however operationally defined, deliberate practice leaves a large amount of the between-person variability in expertise unexplained. Thus, while deliberate practice may be an important predictor of individual differences in expertise, it is not the only important predictor or even necessarily the largest. Furthermore, Ericsson and colleagues' case for the importance of deliberate practice is based almost entirely on correlational evidence that does not rule out an influence of aptitude.

# 13.3.1 The Multifactorial Gene-Environment Interaction Model

Expanding on existing theory (e.g., Gagné, 2013), the Multifactorial Gene-Environment Interaction Model (MGIM) of expertise provides a framework for thinking about what factors influence expertise (Ullén, Hambrick, & Mosing, 2016). As shown in Figure 13.4, the MGIM assumes that (1) expertise arises from influences of both domain-general traits and domain-specific knowledge on expertise (i.e., domain-specific performance); (2) these factors may influence expertise both indirectly and directly; and (3) genetic and environmental factors operate together to produce individual differences in expertise.

At the core of the MGIM is the concept of *gene-environment interplay*, including both gene-environment correlation (*r*GE) and geneenvironment interaction (*G* × *E*). As illustrated in Figure 13.5, *r*GE occurs when people are exposed to different environments as a systematic function of their genetic differences rather than at random

Figure 13.4: The Ullén-Hambrick-Mosing multifactorial gene-environment interaction model (MGIM) of expertise (used with permission of Routledge from Hambrick, Campitelli, & Macnamara, 2018).

(Plomin, DeFries, & Loehlin, 1977). There are three types of *r*GE, each of which can be seen as fundamental for understanding the development of expertise (see Tucker-Drob, 2018). The first is *passive r*GE: parents create a home environment that is influenced by their own genetic characteristics, which they pass to their children. For example, parents who have high levels of music aptitude may create a musically-rich environment for their children. The second is *active r*GE: a person's geneticallyinfluenced traits influence him or her to actively seek out certain experiences. For example, a child with a high level of music aptitude may beg his or her parents for music lessons and seek out musical experiences on their own. The final type is *evocative r*GE: a person's genetically-influenced characteristics elicit particular reactions from other people. For example, a child possessing a high level of music aptitude may be noticed by music teachers, who provide special opportunities for the child to develop musical expertise.

*G*×*E*, on the other hand, occurs when the magnitude of genetic influence on an outcome varies as a function of the type or amount of an environmental experience. (In Figure 13.5, *G*×*E* is illustrated with intersecting G and E pathways.) In the context of developing expertise, *G*×*E* could occur is if training diminished genetic influence on performance.

Ericsson et al. (1993) alluded to the former possibility when they claimed that general cognitive ability, which is genetically influenced, is predictive of performance in the initial stages of skill acquisition, but then loses its predictive power (see also Ericsson, 2014). Or it could occur if training enhanced genetic influence on performance. For instance, while Ericsson (2007) claimed that deliberate practice activities "dormant genes that all healthy children's DNA contain" (Ericsson, 2007, p. 4, emphasis added), it may also activate otherwise dormant genes, variants of which *differ* across individuals.

### 13.3.2 Evidence for Genetic Influence

The basic goal of behavioral genetic research is to explain variation across people in some phenotype—an observable behavior or characteristic—in terms of variation in those people's genotypes their genetic makeup (Knopik, Neiderhiser, DeFries, & Plomin, 2016). The most commonly used BG research design is the twin study, which compares identical twins with fraternal twins (for reviews, see Mosing & Ullén, 2016; Mosing, Peretz, & Ullén, 2018). Identical twins are monozygotic (MZ), meaning that they were derived from a single ovum and share 100% of their genes, whereas fraternal twins are dizygotic (DZ), meaning that they were derived

Figure 13.5: Illustration of gene-environment interplay, including gene-environment correlation (*r*GE) and gene × environment correlation (*G*×*E*), in the context of the development of musical expertise.

from separate ova and share only 50% of their genes on average. Thus, to the extent that variation in a trait is influenced by genes, MZ twins should be more similar to each other on that trait than DZ twins are to each other on that trait. In statistical terms, the MZ correlation should be greater than the DZ correlation.

There is evidence from twin studies for a genetic influence on individual differences in expertise. Using a twin design, Coon and Carey (1989) used a sample of over 800 twin pairs to estimate the heritability of musical accomplishment. The twins completed a survey to determine whether they were identical or fraternal, and then completed a survey that included several questions about both music accomplishment and music practice. For a measure of musical achievement, the heritability estimate was 38% for males and 20% for females. In another twin study, Vinkhuyzen, van der Sluis, Posthuma, and Boomsma (2009) analyzed data from a study in which 1,685 twin pairs rated their competence in chess, music, and several other domains. Heritability ranged from 50% to 92% for endorsement of *exceptional talent*.

More recently, in a large sample of adolescent twins, Plomin and colleagues found that genetic factors accounted for over half of the variation between expert and less skilled readers, where experts were defined as individuals who scored above the 95th percentile on a standardized test of reading ability (Plomin, Shakeshaft, McMillan, & Trzaskowski, 2014). Drayna, Manichaikul, de Lange, Snieder, and Spector (2001) reported heritability estimates of 80% for performance on the Distorted Tunes Test, which requires the participant to identify incorrect pitches from familiar melodic stimuli.

There is also emerging evidence for *r*GE and *G* × *E* in the development of expertise (see Mosing & Ullén, 2016; Mosing, Peretz, & Ullén, 2018). Using data from the National Merit twin sample, Coon and Carey (1989) found heritability estimates of 38% for males and 20% for females for music achievement. In a more recent analysis of this dataset, Hambrick and Tucker-Drob (2015) found that heritability was substantial not only for musical achievement (26%), but also for a measure of music practice (38%). This finding is readily interpretable

as an instance of *r*GE—the idea that people's genotypes influence on whether they engage in music practice. More generally, as mentioned earlier, a person with high aptitude for some activity is probably more likely to practice that activity than a person with lower aptitude (see Sternberg, 1996). Hambrick and Tucker-Drob also found evidence for a *G*×*E*: the heritability of musical accomplishment was higher for a group that reported practicing regularly than for a group that did not. This evidence is in line with an earlier twin study on training of the rotary pursuit task, which found that genetic influences on performance as well as learning rate increased after three days of training (Fox, Hershberger, & Bouchard, 1996).

In a much larger study, Mosing, Madison, Pedersen, Kuja-Halkola, and Ullén (2014) had over 10,000 twins complete a test of musical aptitude (the Swedish Musical Discrimination Test). The heritability was 50% for rhythm discrimination, 59% for melody discrimination, and between 12% and 30% for pitch discrimination, and averaged around 50% for accumulated amount of music practice. Furthermore, intra-twin pair modeling revealed that identical twins who differed massively in accumulated amount of music practice did not perform significantly different on the tests of music aptitude. Thus, while certain types of knowledge and skill necessary to play music at a high level must be acquired (e.g., how to read music), basic sensory capacities involved in playing music may not be influenced by music practice.

Taken together, findings of these twin studies indicate that there are both direct and indirect effects of genetic factors on expertise. More specific information about the role of genetic factors in expertise comes from molecular genetics, a type of behavioral genetic research that seeks to identify associations between specific genes and performance. In a series of studies, North and colleagues documented correlations between genotype for the ACTN3 gene, which codes the alpha-actinin-3 protein in fast-twitch muscles, and performance in various sprint events. For example, in one study (Yang et al., 2003), compared to 18% of control subjects, only 6% of 107 elite athletes from various short-distance events had a variant of ACTN3 that

made them alpha-actinin-3 deficient. Furthermore, *none* of the most elite athletes in the sample—the 32 Olympians—were alpha-actinin-3 deficient.

There is also an emerging molecular genetic literature on music (see Tan, McPherson, Peretz, Berkovic, & Wilson, 2014, for a review). Di Rosa and colleagues (Di Rosa, Cieri, Antonucci, Stuppia, & Gatta, 2015) used Ingenuity Pathway Analysis (IPA), a procedure for identifying links between biological functions and genes, to identify possible interactions between genes potentially related to musical ability and those deleted in individuals with Williams Syndrome—a genetic disorder that is associated with serious deficits in some cognitive domains but surprisingly good musical skills. Di Rosa et al. reported a potential interaction between a gene related to Williams Syndrome (STXX1A) and one related to music skills (SLC6A4) gene. Both of these genes are involved in serotonin transporter expression, suggesting that serotonin may be involved in the development of musical abilities.

# 13.3.3 The Future of Genetic Research on Expertise

Expertise is a complex phenotype. For example, expertise in a sport reflects multiple, interacting cognitive, motoric, and perceptual subcomponents, each of which may be influenced by different genetic factors. Consequently, it is unreasonable to expect that scientists will ever discover a single genetic variant (or even a small number of genetic variants) that will account for all, nearly all, or even most of the phenotypic variance in expertise in various domains. Instead, what Chabris and colleagues have termed the Fourth Law of Behavioral Genetics will almost certainly hold true for expertise: "A typical human behavioral trait is associated with very many genetic variants, each of which accounts for a very small percentage of the behavioral variability" (Chabris, Lee, Cesarini, Benjamin, & Laibson, 2012, p. 305).

Just as astronomers may never fully understand the exact sequence of events leading to the creation of the universe, expertise researchers may never be able to fully explain how genetic factors translate into exceptional performance in complex domains. The task may exceed the powers of scientific imagi-

nation, not to mention computing power. However, just as astronomers will not abandon the idea that the universe can be explained in physical terms, expertise researchers should not abandon the idea that genetics must play an important role in expert performance. Moreover, just as neuroscientists do not wait for a complete understanding of how the brain controls thought and behavior to apply their findings to practical problems (e.g., diagnosis, treatment), expertise researchers should not wait for a complete understanding of how genetics influences expert performance to begin making practical use of findings from behavioral genetics. For example, across a range of domains, using information about geneenvironment interplay, it may one day be possible to tailor training using information about people's genotypes, as is already being done in sports (e.g., Mann, Lamberts, & Lambert, 2011). This type of intervention promises to bring high levels of performance within the reach of more people than is currently the case. As Plomin (2018) noted:

The importance of gene-environment correlation suggests a new way of thinking about the interface between nature and nurture that moves beyond a passive model, which assumes onesize-fits-all training regimes that are imposed on individuals, to an active model in which people select, modify, and create their own environments that foster the acquisition of expertise, in part on the basis of their genetic propensities. (p. xvi)

Scientific understanding of the genetics of expertise will presumably always be incomplete, but this is no reason forestall capitalizing on knowledge from this area of research to inform the design of applications that can make people's lives and society better.

#### 13.4 Conclusions

From prehistory to the present, people have probably always been interested in the origins of expertise. For nearly a century, the nurture view of expertise has held sway in psychology. This view argues that individual differences in expertise overwhelmingly

reflect the role of environmental factors, with no important role for genetic factors. Most notably, over the past 25 years, Ericsson and colleagues' (Ericsson et al., 1993) deliberate practice view has had a major impact on both scientific and popular views on the nature and origins of expertise. With the caveat that the evidence is almost entirely correlational, research inspired by this view suggests that training history may well be an important determinant of individual differences in expertise. At the same time, the available evidence indicates that training history is probably not as important as Ericsson and colleagues have argued—and that other factors

are probably *more* important than they have argued, including genetically-influenced abilities and capacities. Accordingly, my colleagues and I have argued that the science of expertise must embrace the idea that the origins of expertise can never be adequately understood by focusing on one, or one class, of determinant (see Hambrick et al., 2016; Ullén et al., 2016). We believe that research guided by this perspective will shed new light on factors that contribute to expertise, which in turn will provide solid scientific grounding for interventions to accelerate the acquisition of expertise.

#### Summary

Scientific research on human expertise focuses on the nature and origins of complex skill in domains such as music, sports, and games. A central question in this area of research is why some people reach a higher level of ultimate performance than do other people in these domains. Research reveals that training history cannot account for all, or even most, of the differences across people in expertise. The practical implication of this finding is that people may require vastly different amounts of training to reach a given level of skill. This chapter describes a multifactorial perspective on expertise, which seeks to identify all factors contributing to individual differences in expertise, including both experiential factors ("nurture") and basic abilities and capacities ("nature").

#### Review Questions


#### Hot Topic

Zach Hambrick

Though I can hardly believe it, I have been studying the same topic (expertise) for nearly 25 years—since my first year of graduate school at Georgia Tech, in 1995. Time flies when you're having fun. These days, I am fortunate to have a job as a professor. However, my daily activities as a researcher are much the same as they were when I was a graduate student.

Most days, I write something having to do with my research. This includes working on manuscripts of various types, including scientific reports of research from my lab, book chapters like the one you are

reading right now, and grant applications to secure funding for my lab. It also includes writing reviews of manuscripts I have been asked to evaluate for publication in scholarly journals (having an expert from the field evaluate a manuscript that another researcher has submitted to a journal for publication is called "peer review"). Over the years, I have written hundreds of reviews. I can't say that this is my favorite task, but it's an essential form of professional service, and I take it seriously (after all, someone has taken time out of their busy schedule to review my manuscript submissions). I also do a lot of writing in my role as editor of the *Journal of Expertise*. Of course, I also spend a good deal of time on any given day reading what other researchers have written.

I also spend a great deal of time interacting with my students and colleagues about various aspects of the dozen or so research projects that we have going on at any given time. We discuss (in person, or via Skype or e-mail) everything from the logistics of recruiting participants for a project, to questions about how best to analyze data we have collected, to conceptual issues at the core of designing a project. This also includes what is undoubtedly the most important part of my job: mentoring. Whether formally or informally, I advise students almost every day. This is the part of my job that I love the most. More than 20 years ago, my mentors took time out of their busy schedules to help me develop my ideas for research, to read drafts of my manuscripts, and to give me career advice. I can't thank my mentors enough for the help they gave me, and I try to do the same.

# References


*and games* (pp. 271–301). Hillsdale, NJ: Erlbaum. doi:10.4324/9781315805948


#### Glossary Hambrick

# Glossary


genotype A person's unique genetic makeup. 244


# Chapter 14

# Intelligence

#### OLIVER WILHELM & ULRICH SCHROEDERS

Ulm University & University of Kassel

Most other chapters in this volume tackle the nature of human thinking from the perspective of cognitive psychology, for example how humans derive deductions. In most of these chapters, human subjects are treated uniformly; that is, an attempt is made to describe and explain the cognitive principles of deductive reasoning that are common to all people. In this chapter, the focus will instead be placed on what makes people different: individual differences between persons. The areas of cognition discussed in most other chapters provide the required background information on what exactly humans engage in while working on an intelligence test. Whereas some measures stress deductive inference, others might provoke complex problem solving behavior. The focus of this chapter is to study why some subjects answer those intelligence items correctly while others get them wrong, and why these differences are meaningful and interesting.

In the first section of this chapter, we will approach the concept of intelligence by briefly summarizing the history of relevant psychometric intelligence models. While a historical overview might seem somewhat inappropriate in an introductory chapter, in the case of intelligence research, this perspective provides us with a set of competing accounts essential for understanding intelligence data and intelligence theories. We will then proceed by describing an established taxonomy of intelligence factors and discuss intelligence as an overarching concept for all measures that provoke maximal cog-

nitive effort. The second section will be very pragmatic, showing how intelligence can be measured, how it can be used for predictive purposes, and whether it can be changed through interventions. We will conclude the chapter by examining important issues for future intelligence research. A very broad and relatively developed field such as intelligence and its assessment cannot be addressed exhaustively in an introductory chapter. We hope that the references provided in this chapter will be helpful for further reading and will enrich one's understanding of contemporary research in the field.

# 14.1 Understanding Intelligence

Research on the structure of individual differences in intelligence follows an atypical strategy, relative to most other psychological research. Typically, theories and hypotheses are proposed, followed by the development of adequate means for testing and evaluating; intelligence research instead proceeds in reverse. For instance, in the beginning of intelligence research as an independent field, factor analytic methods were invented and refined, with corresponding theories of intelligence developed afterwards. This approach to intelligence research places the focus on competing explanations of individual differences in a broad variety of intelligence tasks. Unfortunately, we must skip some important early contributions: for example, Galton (1883) developed several simple tests of intellectual functioning and made early contributions to the heredity of intelligence. Binet deserves credit for compiling one of the original intelligence tests, although his efforts could hardly be considered a clear theoretical contribution on the structure of intelligence. Moreover, Ebbinghaus (1895) developed several intelligence tests that were reused in other fields before making a much later comeback in intelligence research (Ackerman, Beier, & Bowen, 2000).

# 14.1.1 The History of Intelligence Models and the Usual Suspects

In the following section, we will present different ways of conceptualizing intelligence (see Figure 14.1). We start with Spearman (1904a, 1904b), who made two seminal methodological contributions in the year he completed his dissertation (on

a completely different topic) with Wilhelm Wundt in Leipzig. In one of these contributions, he laid the foundation for what is known today as classical test theory (Lord & Novick, 1968). In his other contribution, he established the groundwork for the general factor theory. This theory is based on two central assumptions. First, a latent factor (*g*) accounts for the correlations between all intelligence tasks. Second, besides this general factor, there are test-specific individual differences. Apart from these two components, there is only unsystematic measurement error in intelligence tasks.

The first assumption is an idea prevalent throughout research on individual differences and applies to traits such as extraversion and achievement motivation. The assumption is that there is a stable disposition within persons to act in specific ways. In the case of the *g*-factor, this disposition is to do well

Figure 14.1: Psychometric models of intelligence.

on tasks requiring cognitive effort. This disposition is deemed causal for the correctness or swiftness of responses on each intelligence item. The ideas of a latent trait also apply to most other theories of intelligence structure. Usually, these traits are deemed stable over time, that is, the rank-order of subjects does not change dramatically over time. They are considered broad, in the sense that they do not only apply to a highly-specific test but also to similar examinations. They are expected to be relevant, meaning they predict real-life outcomes that are of individual or societal relevance. In the case of the *g*-factor, only one such latent variable is specified for the field of intelligence. Spearman's theory states that the correlation between any two intelligence tasks is because of the *g*-factor (Figure 14.1, panel A). As a side note, the g-factor theory competed in the early days of intelligence testing with the so-called bond theory (Thomson, 1919). The bond theory stated that the magnitude of the correlation between any two intelligence tasks indicates the proportion of overlapping processes—the higher the correlation, the larger the number of shared processes. Somewhat more cognitively, the componential theory of intelligence proposes that the correlation between two intelligence tasks is a function of shared components and the theory was put to the test for example in the area of analogical problem solving (Sternberg, 1977). This approach has recently gained new attraction (van der Maas et al., 2006). A somewhat related approach pursued with different methods is called facet theory. Here, the overlap of task attributes determines the correlations between intelligence tasks, while the magnitude of correlations is graphically represented by proximity (Guttman & Levy, 1991).

A major competitor of the *g*-factor theory arose with the development of multiple factor analysis (Thurstone, 1931). This procedure allowed for the extraction of more than one factor (Figure 14.1, panel B). Combined with the development of factor rotations, the interpretation of intelligence factors was greatly facilitated (Thurstone, 1934). Thurstone subsequently proposed 7 primary mental abilities (Thurstone, 1938). Thurstone initially proposed reasoning, spatial visualization, verbal comprehension, word fluency, number facility, associative memory, and perceptual speed—and he updated and pro-

longed this list jointly with his wife three years later (Thurstone & Thurstone, 1941).

Disentangling different contributions to performance on intelligence tasks was also the main purpose of the so-called bifactor approach (Holzinger & Swineford, 1937). Similarly, Schmid and Leiman (1957) proposed rotation techniques to distinguish between independent performance contributions to individual differences in intelligence tasks. Both approaches (Figure 14.1, panel C) are early hierarchical perspectives on intelligence.

Higher-order factor models are another way to conceptualize intelligence because the ubiquitous positive correlation between any two intelligence tasks also leads to correlations between intelligence factors. These factor correlations are the basis for higher-order models of intelligence (Figure 14.1, panel D). In these models, a second-order factor accounts for the correlations between first-order factors, which in turn accounts for the correlations between intelligence tasks (Carroll, 1993).

# 14.1.2 Accepted Views on the Structure of Intelligence

Among the more contemporary models, Cattell's theory of fluid and crystallized intelligence (Cattell, 1971; see also Brown, 2016) has become a widely accepted and applied model for the description and testing of intelligence. The gf-gc-theory also heavily stimulated theory building, as can be seen in the investment theory (Cattell, 1971) or the PPIK theory (Intelligence-as-Process, Personality, Interests and Intelligence-as-Knowledge, Ackerman, 1996). Furthermore, the integration of the gf-gc-theory into personality research and its validation and use in aging research has contributed to its popularity. In the current version, the gf-gc theory assumes nine primary factors (McGrew, 2009), of which fluid and crystallized intelligence are central (see Table 14.1).

A closely related milestone in intelligence research is the seminal work of Carroll (1993). The comprehensive synopsis and reanalysis of decades of factor-analytic intelligence research and the theory-guided integration of these findings led to a structural model that, in view of the factors postulated, bears much resemblance to the model of


Table 14.1: Overview of the central factors of cognitive ability.

*Note*. Labels in the first column are taken from the CHC model.

Cattell and Horn. Carroll (1993) reanalyzed 461 data sets from factor analytic intelligence research including diverse populations, countries, decades, and a full variety of cognitive tasks developed by that time. To this day, Carroll most likely compiled the most comprehensive overview of cognitive ability measures. His analyses led to a structural model distinguishing three levels of generality (see Figure 14.2).

At the middle level of generality, eight broad ability factors are distinguished (see Table 14.1). Once again, any two intelligence tasks will always show a positive correlation and these eight factors will

therefore show positive manifold. This positive manifold is captured with an overarching general intelligence factor at the apex of the higher-order model of intelligence. Such models have become more prevalent and popular recently (e.g., Gustafsson 1999), because they a) explicitly address and capture the substantial positive correlations between intelligence tasks and intelligence factors, and b) deliver the best from the two worlds of group factor theories and a general factor theory. In pragmatic terms, the factors from the middle level of generality are not all of equal importance. Whereas fluid and crystallized intelligence are indispensable in intelliAccepted Views on the Structure of Intelligence Wilhelm & Schroeders

Figure 14.2: A slightly revised version of Carroll's Three-Stratum Theory.

gence tests, other factors are mostly needed to give a comprehensive picture of an individual's cognitive abilities. Unsurprisingly, fluid and crystallized intelligence (and mixtures of both factors) are also most predictive for outcomes such as educational achievement or job performance. Please note that fluid intelligence has been found repeatedly to show the strongest relation with the overarching general factor. Therefore, if only a single task can be used to measure intelligence, your choice should be to pick a fluid intelligence task.

At the lowest level of the hierarchy there are many specific intellectual abilities that serve to underline the breadth of factors at the middle level and to illustrate the exhaustiveness of the model. Taken together, the work of Cattell, Horn, and Carroll by and large converges on the model shown in Figure 14.2. The discussion of research on this model integrates and successively extends the common ground on individual differences in intelligence (McGrew, 2009). In the current version of the model, more specific abilities, such as specialized knowledge in the sense of expertise or reading and writing skills, have been included. Importantly, in the last two decades, popular and frequently used intelligence tests switched to the Cattell-Horn-Carroll (CHC) model—a change

that was desperately needed for various Wechslertests in particular.

Despite its unifying character, the CHC model must not be misunderstood as a final model of intelligence structure. There are many open questions, some of which we will discuss in later sections of this chapter. In addition, our presentation of intelligence relies on psychometric, mainly factoranalytical approaches for studying individual differences in cognitive abilities. However, we want to mention that there are several theories of intelligence that cannot be given full consideration in the course of an introductory chapter. A theory that is popular, especially among educators and teachers, is the theory of "Multiple Intelligences" by Gardner (1983, 1991) who advocated against g, proposed distinct forms of intelligence and claimed that students can be categorized in eight different types of learners (i.e., visual-spatial, bodilykinesthetic, musical-rhythmic, interpersonal, intrapersonal, verbal-linguistic, logical-mathematic, naturalistic). However, multiple intelligences appear to be a blend of g, broad ability factors below g, and other non-cognitive factors (Visser, Ashton, & Vernon, 2006) and there is no adequate empirical evidence to justify incorporating learning-styles into education (Pashler, McDaniel, Rohrer, & Bjork, 2008).

The concept of emotional intelligence has also gained considerable attention (Salovey, Mayer, & Caruso, 2004) and received substantial criticism (Davies, Stankov, & Roberts, 1998). It is argued to comprise the abilities to perceive emotions, the abilities to access, generate, and use emotions, the abilities to understand and regulate emotions and finally to enclose knowledge about emotions (Salovey et al., 2004). For most of these abilities it is difficult to come up with an unequivocal response standard, i.e. what might work to regulate Persons A's emotions might be counterproductive for person B. Nevertheless, recent efforts to include some aspects of emotional intelligence into a higher-order model of intelligence were successful (MacCann, Joseph, Newman, & Roberts, 2014) and future research in this area might be promising.

# 14.1.3 Intelligence as Overarching Concept of Maximal Cognitive Effort

Our discussion of intelligence has yet to include an actual, clear definition of intelligence. Indeed, prior attempts of specifying what intelligence is and what it is not were of limited success. The infamous definition that intelligence is what the test measures (Boring, 1923) begs the question of which tasks or factors of intelligence are indispensable and what should not be part of the concept "intelligence". In response to public controversy over the term intelligence, Gottfredson and 52 other researchers (1997, p.13) gave a very broad definition of intelligence: "A very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience." Similarly, Neisser et al. (1996, p. 77) defined intelligence as individual differences between persons "[. . . ] in their ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought." The essential components of these definitions center on

aspects of fluid intelligence and neglect other factors we described above. In addition, both definitions are opaque with respect to concepts such as ability, achievement, aptitude, competence, proficiency, talent, skill, and so on (Schroeders, 2018). Some of these terms are used in specific research traditions or serve to set a specific focus. For example, *competency* or *proficiency* are preferred in an educational setting because within the spectrum of abilities under consideration, the abilities trained in formal learning (e.g., schooling) are perceived as malleable and acquired. In contrast, *talent* often labels an inherited or an exceptional ability (e.g., musical or artistic talent). The subtle nuances between these concepts, which are all related to effortful cognitive processing, are best seen in the context of the research tradition from which they originate. If you were asked to classify existing measures of intelligence, competence, aptitude, skill, etc., you would hardly be able to come up with a dependable classification of tasks. Therefore, these terms should be characterized as "fuzzy" and insufficient when it comes to explaining relations between tasks or to assigning tasks to factors.

In order to derive a dependable and inclusive understanding of what constitutes an intelligence task, we recommend using intelligence as an overarching concept of maximal cognitive effort. The distinction between typical behavior and maximum cognitive performance dates back to Cronbach (1949): typical behavior refers to the ways individuals usually behave and what they like or dislike. It is usually captured through self-reports on behaviors, preferences, and valences. For example, the question "Do you like solving math puzzles?" arguably describes an individual's preference for engaging in mathematical problem solving. Responses to such questions presuppose the willingness and ability of subjects to introspect. As well, these responses are very vulnerable to subjective judgments and biases (e.g., social desirability). In contrast, maximal cognitive performance refers to the measurement of abilities, achievements, skills, talents, etc. An item such as, "What is the solution to *f*(x) = 3x<sup>2</sup> + 12?" differs fundamentally from the assessment of typical behavior in several ways. Items of maximal behavior will only be used in contexts in which a) the person be-

ing examined is aware of the performance appraisal, b) the person is willing and able to show maximal cognitive effort, and c) the standards for evaluating the response behavior are adequate for the purpose of making a diagnostic judgment (Sackett, Zedeck, & Fogli, 1988). Preferably, objectively correct solutions are used as a benchmark for actual response behavior. In some domains, providing a veridical response standard is not feasible. For example, it is very difficult to provide such a standard for written essays and tasks designed to tap into interpersonal intelligence factors such as understanding emotions (for a recent though incomplete summary concerning intrinsically personal tests, see Mayer, 2018). Rather, these tasks often rely on situational judgment methodology (Oostrom, DeSoete, & Lievens, 2015).

One important aspect that we want to stress is the unfortunate division between psychological and educational testing of maximal effort concepts. More than a century ago, Binet (1904) distinguished between medical, pedagogical, and psychological methods in intelligence testing. The medical method aims "to appreciate the anatomical, physiological, and pathological signs of inferior intelligence" (Binet, 1904, p. 194). Thus, this method will receive no further consideration in this chapter. The psychological method "makes direct observations and measurements of the degree of intelligence" (Binet, 1904, p. 194) and focuses on reasoning and memory-related abilities. The pedagogical method "aims to judge intelligence according to the sum of acquired knowledge" (Binet, 1904, p. 194). It is clear in our earlier presentation of essential intelligence factors that the psychological and the pedagogical method roughly correspond to fluid and crystallized intelligence respectively. This early distinction by Binet, unfortunately, led to a subsequent separation of efforts related to his two methods. Consequently, fluid intelligence or equivalent concepts such as decontextualized thinking, academic intelligence, etc., are hardly accepted determinants of educational outcomes and have often been considered taboo in an educational context. Conversely, elaborating on crystallized intelligence or related concepts such as expertise and how they could enrich cognitive ability testing has yet to become popular in psychometric research con-

texts. Unfortunately, the separation between these two fields has yet to be overcome. As a remedy, we propose that the term intelligence be used as an overarching concept that encompasses mechanical abilities such as fluid intelligence, memory, and processing speed, as well as knowledge-driven aspects, such as crystallized intelligence with its myriad of facets.

Next, we want to relate intelligence assessment with educational assessment to illustrate the overarching/unifying aspect of intelligence. The debate regarding the extent intelligence tests and educational achievement tests measure the same underlying abilities has a long history (Baumert, Lüdtke, Trautwein, & Brunner, 2009). We propose that the problem of distinguishing between intelligence tests and other measures for assessing cognitive abilities (e.g., educational achievement tests) is not whether a person's scores on both methods are perfectly correlated (Bridgeman, 2005). To understand differences between both fields, it is more instrumental to study attributes in which such measures differ: for example, where they are located on the continuum "decontextualized" vs. "contextualized" and which predictions the contextualization of measures affords (Brunner, 2008). This approach clearly places the competencies studied in educational psychology below crystallized intelligence. For example, Baumert and colleagues (2009) suggested that international education studies, such as PISA (Program for International Student Assessment), primarily capture the cumulative outcomes of a knowledge acquisition process. This understanding of competence is broadly identical to Cattell's definition of crystallized intelligence (1971), according to which crystallized intelligence encompasses the totality of knowledge that people acquire and use to solve problems throughout their lives. Whereas the nature and content of educational tests are usually carefully studied, many traditional tests of crystallized intelligence neglect content validity—a lesson that can and should be learned from educational testing.

We advise against relying on a test's purpose to understand what the test measures. College admission tests do not measure the ability to study. Such tests usually include measures of fluid intelligence along with domain-specific crystallized intelligence

tasks. School readiness tests do not capture the ability to attend school—instead, they are best seen as a composite of gc tasks and social skills. If you want to understand what a measure of maximal cognitive performance captures, it is not wise to focus on the purpose of testing. Instead, it will be more useful to classify a measure according to the intelligence factors described here.



Figure 14.3: Example item for fluid intelligence: verbal, numeric, and figural.

# 14.2 Measuring and Using Intelligence

# 14.2.1 Tasks for Measuring Intelligence

In this section, we introduce selected intelligence tasks designed for use with adults and discuss the cognitive demands of these tasks. We focus on the two factors of fluid and crystallized intelligence because these factors are the most decisive and important predictors in most applied settings, such as college admission or personnel selection.

#### 14.2.1.1 Tasks for Measuring Fluid Intelligence

Earlier in this chapter, we argued that a fluid intelligence task should be chosen when only a single task can be used to measure intelligence. Such a fluid intelligence task would then serve as a marker task for intelligence. Below the fluid intelligence factor, Carroll (1993) distinguished three reasoning factors:


in an ensemble of similar figures) or working with matrices (identifying a figure that replaces a placeholder within a matrix so that the pattern found in rows and columns persist) (see Figure 14.3 c) for an illustration). Formally, all inductive intelligence tasks are essentially enthymemes, that is, deductive inferences in which one or more premises are implicitly "added" rather than explicitly formulated (Wilhelm, 2004).

• *Quantitative reasoning* tasks assess quantitative-numerical components of reasoning. These demands may be deductive, inductive, or a combination of both. Typical examples of quantitative reasoning are mathematical word problems or number series (see Figure 14.3 b) for an illustration). In general, the difficulty lies in mathematical modeling, the numerical-formalization of a problem, rather than in the actual calculation (Carroll, 1993).

A closer examination of the tasks subsumed below Carroll's three reasoning factors suggests that the sequential reasoning factor is predominantly a verbal reasoning factor, the inductive factor is mostly covered by tasks with figural content, and quantitative reasoning relies on numeric content. This interpretation is also supported by Carroll's observations (see Table 6.2 in Carroll, 1993, pp. 217) and his interpretation of the factors of the individual studies. Wilhelm (2004) used confirmatory factor analysis to examine this relationship between 12 different fluid intelligence tasks more closely. Among these tasks, prototypical indicators of deductive reasoning (e.g., propositions and syllogisms) and inductive reasoning tasks (e.g., series and matrices) were selected. The comparison of competing measurement models revealed that a model in which the correlation between inductive and deductive thinking was freely estimated described the data as well as a model in which inductive and deductive thinking were modeled as a common factor. Thus, a distinction between inductive and deductive thinking is artificial and unnecessary from the perspective of differential psychology. Another important finding was that

a model with three correlated content factors, covering verbal, numeric, and figural stimulus material, described the data much better than a model with a single reasoning factor (Wilhelm, 2004). The model with three correlated content factors (with no other covariates) is statistically equivalent with a higher-order model in which the content factors load on a higher-order fluid intelligence factor. In line with previous research (e.g., Marshalek, Lohman, & Snow, 1983), the figural reasoning task showed the strongest relation with the overarching fluid intelligence factor, which suggests that the figural content is the best single indicator of fluid intelligence. In summary, the classification of fluid intelligence tasks based on its content is both theoretically and empirically well supported (see Figure 14.3 for example items). Please note that broad visual perception includes spatial ability parts that are close to the reasoning factors discussed here (Lohman, 1996).

Developing a sound and efficient fluid intelligence task is more of an art than a science (Kyllonen & Christal, 1990). This position is predominantly due to a theoretical deficit: most available intelligence tasks suffer from a lack of well-founded theoretical assumptions about the cognitive processes required to successfully complete the tasks in question. Such an underlying theory could be used to derive procedures that generate items automatically, and it could provide *a priori* estimates of item difficulty. For example, for the figural aspect of fluid intelligence, numerous taxonomies for constructing matrix items have been proposed (Carpenter, Just, & Shell, 1990). In his review, Primi (2001) reduced the complexity of influencing factors on item difficulty to four main attributes: (1) the number of elements, (2) the number of transformations or rules, (3) the type of rules, and (4) the perceptual organization. The success of this proposal and similar efforts is mixed, and moreover, most efforts are limited to specific types of tasks.

A promising approach to circumvent these problems and to gain a more profound understanding of reasoning is to instead rely on the concept of working memory capacity (WMC). With respect to going beyond task-specific models of what changes the difficulty and nature of a task, WMC can be applied to many working memory items by specifying

the storage and processing demands of a task. In a memory updating task, for example, subjects might be shown digits presented in four different locations. These digits disappear, and subjects briefly receive instructions for simple computations at the location of individual digits, one after another. After several such computations, subjects are asked to provide the final results for each of the locations. Such tasks can easily be generated by computers, and their difficulty can be predicted very well with just a few task attributes. WMC tasks might not only prevent some of the problems prevalent with reasoning measures but they are also the key to understanding fluid intelligence and intelligence in general (Engle, 2018; Oberauer, Farrell, Jarrold, & Lewandowsky, 2016).

The relation between WMC and fluid intelligence has received considerable attention (Kane, Hambrick, Tuholski, Wilhelm, Payne & Engle, 2004; Oberauer, Schulze, Wilhelm, & Süß, 2005) and there is a broad consensus that this relation is very strong, though not perfect (Kyllonen & Christal, 1990). The main reasons for the very strong, but not perfect, relation might be twofold. First, despite being unwanted, many reasoning tasks do have knowledge requirements that might bias the relation with WMC in a downwards fashion. Second, many WMC tasks have an intrinsic speed requirement by limiting stimulus exposition or time windows for responding. If these biases were adjusted for, the relations between fluid intelligence and WMC might be perfect (Wilhelm, Hildebrandt, & Oberauer, 2013).

#### 14.2.1.2 Tasks for Measuring Crystallized Intelligence

According to Cattell (1971), crystallized intelligence should be seen as the result of the investment of fluid intelligence in learning situations, but also depends on additional sources such as investment traits (Ackerman, 1996) and interests (Su, Rounds, & Armstrong, 2009). Thus, gc reflects the impact of education, learning, and acculturation on knowledgerelated intelligence tasks. During school years, the item universe for gc measurement is at least partly predetermined through the canon of formal education and through cultural standards that roughly prescribe what children and adolescents are expected to

learn and know (Cattell, 1971). This notion suggests an assessment of gc via factual knowledge tests that captures both school and extracurricular content (for an example item see Figure 14.4). As learning opportunities become more and more diverse across one's lifespan and after regular schooling, the assessment of gc becomes increasingly difficult. An ideal measurement of gc must include the whole variety of knowledge that people *can* acquire during their lives (and that are somewhat valued culturally). Consequently, it would require as many different tasks as there are occupations, recreational activities, and other differential learning opportunities. The central role of knowledge in the concept of crystallized intelligence is also emphasized by Ackerman (1996), who stated that gc measures should not be an in-depth assessment of knowledge within a specific domain or a few selected domains; rather, gc measures should be conceptually broad.


Figure 14.4: Example item for crystallized intelligence.

In reality, gc is predominantly assessed via verbal indicators such as vocabulary and verbal fluency tasks. There is no doubt that language skills are important and a result of formal education and, thus, culturally-shared knowledge. This idea is also consistent with the description of gc in Carroll's Three-Stratum Theory (1993). However, the factoranalytic results could instead be an artifact of current assessment practices which have an overrepresentation of verbal ability measures.

But it is also apparent that language command describes only a section of culturally-shared knowledge. In fact, in a large-scale educational assessment study, Schipolowski, Wilhelm, and Schroeders (2014) administered various language tasks, including reading comprehension, listening comprehension, language use and writing, together with a broadly sampling knowledge test covering 16 content domains (e.g., physics, art, law) to an unselected sample of 6,071 adolescents. The correlation between latent variables representing language command and knowledge was very high (ρ = .91), but significantly different from unity. About 17% of the variance in the knowledge factor was independent of individual differences in language command and fluid intelligence (and vice versa). Thus, a restriction to purely language-related content must be regarded as deficient in light of the abovementioned definition of gc because it equates a part of gc with the overarching gc factor (Amthauer, Brocke, Liepmann, & Beauducel, 2001). Please note that command of language may or may not be different from a concept-labeled verbal ability by some researchers.

Cattell (1971) also drew attention to the fact that verbal ability tasks do not necessarily cover gc adequately, especially if the verbal content is strongly over-trained knowledge or decontextualized. Furthermore, knowledge tests also have the greatest potential to minimize the risk of being confounded with fluid intelligence. The maximum separation of gc and gf should be an overriding principle in constructing efficient and distinct measures of cognitive ability (Carroll, 1993). Language skills and reasoning abilities are minimal requirements for knowledge tests, as they are necessary to understand the question and answer options at a basic level. Taken together, we conclude that declarative knowledge tests should take into account as many areas of knowledge as possible to be used as marker variables of gc, as they include a variety of learning experiences that go beyond language skills and competencies.

#### 14.2.2 Validity of Intelligence Tests

Up until this point, we have presented different conceptualizations of intelligence and ways to measure it. We can also take a very pragmatic position while discussing the strengths and benefits of intelligence testing. Intelligence tests are used in psychology, educational research, and other behavioral sciences for a wide range of purposes because intelligence is one of the best predictors of educational, vocational, academic success, and job performance (e.g., Schmidt

& Hunter, 1998; Schmidt, 2002). Intelligence in this context mostly refers to the ability to reason (gf) and domain-related knowledge (gc). The predictive validity of both components seems to vary during the course of life. In a comprehensive review, Baumert and colleagues (2009) compared the results of various educational-psychological studies and showed that the predictive power of domainspecific knowledge in comparison to reasoning becomes more important the older students are. Obviously, the contributions of gf and gc are hard to distinguish because they are strongly correlated. The relevance of knowledge on significant outcomes and its underrepresentation in contemporary intelligence assessment led Ackerman (2000) to the conclusion that domain-specific knowledge is the "dark matter" of adult intelligence. His PPIK theory (intelligenceas-process, personality, interests, and intelligenceas-knowledge; Ackerman, 1996), builds on Cattell's gf-gc-theory. It distinguishes several types of knowledge (e.g., occupational knowledge) to give domainspecific knowledge the space it deserves.

Much research was conducted to shed light on the developmental interplay between gf and gc. In the investment theory, Cattell (1971) proposed that crystallized knowledge develops through the investment of fluid ability. However, empirical evidence for this assumption is sparse. For example, Ferrer and McArdle (2004) used linear dynamic models to study the trajectories of gf and gc from childhood to early adulthood. The results showed no coupling between gf and gc within the studied age range, which clearly contradicts the investment theory. When reviewing available empirical evidence and methodological approaches on the development of gf and gc, it becomes evident that there is no direct or simple explanation to account for the development and mutual relation between cognitive abilities in general, and gf and gc in particular. To overcome this issue, Savi, Marsman, van der Maas and Maris (2018), for example, proposed to abandon factor analytic methods in intelligence research and instead conceptualize intelligence as evolving networks in which new knowledge and processes are wired together during development. This approach might also bridge the gap between the study of individual differences in intelligence and phenomena

primarily studied in cognitive psychology, such as forgetting.

The great importance of intelligence is evident not only in school or university education (Kuncel & Hezlett, 2007; Schmidt & Hunter, 1998), but also in professional training (Ziegler, Dietl, Danay, Vogel, & Bühner, 2011). As a cautionary note, even though intelligence is the most influential single predictor of academic achievement, it still accounts for only about a quarter of variation in the outcome. Accordingly, successful learning at school and the university depends on a plethora of individual characteristics—such as the personality trait conscientiousness (Barrick & Mount, 1991) or interests (Holland, 1997)— in addition to intelligence.

A last aspect of predictive validity we would like to touch upon has to do with death. Initially labeled "ultimate validity" (O'Toole & Stankov, 1992), the relevance of intelligence for longevity becomes increasingly clear. It turns out intelligence might be an essential contributor to epidemiological outcomes in that premorbid intelligence predicts all sorts of health related behaviors and diseases which in turn are related with mortality (Batty, Deary, & Gottfredson, 2007).

# 14.2.3 Training of Intelligence

"How Much Can We Boost IQ and Scholastic Achievement" was the title of an influential and very controversial paper published in the late sixties (Jensen, 1969). In this paper, Jensen drew a somewhat pessimistic conclusion concerning interventions intended to improve IQ or scholastic achievement. In their notorious book, "The Bell Curve: Intelligence and Class Structure in American Life", Herrnstein and Murray (1994) also concluded with negative inferences concerning the improvement of IQ and scholastic achievement. The contributions by Gottfredson (1997) and Neisser et al. (1996) for defining intelligence as a concept (discussed earlier in this chapter) were, in fact, both reactions to the controversy triggered by the Hernstein and Murray book. Importantly, both publications suggested relatively explicitly that many of the observed group differences in IQ and scholastic achievement are determined genetically. Obviously, today's scientists

working in the fields of behavior or molecular genetics of traits have gained a more profound understanding of heritability and use more advanced statistical methods and designs to study the relevance of nature and nurture.

For example, Plomin and von Stumm (2018) summarized recent findings on genome-wide association studies, identifying genome sequence differences that account for 20% of the 50% heritability of intelligence. Such reports on the genetic transmission of intelligence seem to be contradicted by the fact that schooling affects both scholastic achievement (for a comprehensive account, see the classes of evidence described by Ceci, 1991) and intelligence (Becker, Lüdtke, Trautwein, Köller, & Baumert, 2012; Cliffordson & Gustafsson, 2008). However, there is nothing contradictory about these findings once genetic effects are interpreted correctly (Johnson, Turkheimer, Gottesman, & Bouchard, 2010). Also, in a recent meta-analysis of quasi-experimental studies with strong designs (i.e., those that allow statements about increases in intelligence as a function of schooling), Ritchie and Tucker-Drob (2018) summarized overwhelming evidence for education being the most consistent, robust, and durable method for raising intelligence. They found an increase between 1 and 5 IQ points for every additional year of schooling.

Somewhat related, it can be shown that non detrimental or supporting environments have a positive effect on intelligence over a broader time period (Flynn, 1984). Despite contradicting results, the so-called Flynn-effect might in fact not have leveled off in the past two decades (Trahan, Stuebing, Fletcher, & Hiscock, 2014). Beside the aforementioned changes in the educational system, different factors have been discussed for being responsible for the IQ gains. In particular, education and healthrelated factors such as better nutrition and reduced pathogen stress appear to be related to IQ gains (Pietschnig & Voracek, 2015).

The evidence presented so far is correlative and at a macroscopic level. If we want to answer the question laid out at the beginning of this section, we should take a closer look at the experimental evidence. Prior to evaluating such evidence, the benchmark for such an evaluation should be clear. Training effects on intelligence should a) persist after the training ended (effect duration), b) be present in non-trained tasks (effect transfer), c) be specific to the targeted intelligence so that not everything is improving but only trained aspects, d) be stronger in trained than in non-trained subjects (who should be engaged in other training instead of simply waiting), and e) be rational and sensible in the way that the intervention is tailored to what it should accomplish and it provides a non-trivial gain.

Moreover, bearing in mind the current replication crisis (Open Science Collaboration, 2015), training studies should fulfill the requirement of experimental studies concerning sample size, sound measurement instruments, *a priori* specified outcome variable, etc.. Unfortunately, many popular studies that received extensive mass media coverage do not adhere to these requirements (Melby-Lervag, Redick, & Hulme, 2016). Accordingly, many of the bold claims about successfully training intelligence or its most important facets (e.g., Jaeggi, Buschkuehl, Jonides, & Perrig, 2008) can be attributed to methodological flaws and are not due to some miraculous interventions (Melby-Lervag et al., 2016).

Reviewing most interventions shows that they were designed with the hope that a few hours of training would bring about long-lasting, transferable, and relevant improvements in highly general intellectual abilities. This claim is not only bold; it is completely unrealistic. Even if we adhere to a lifestyle that spares us intellectual effort, we can hardly be functioning members of society if we do not regularly engage in effortful, intellectual, and challenging thinking. In other words, our everyday lives provide daily intellectual exercises, no matter how trivial and dull they feel from time to time. Whether we like it or not, we use our intellectual capacity constantly. Training must provide a sufficiently large additional dosage to make a real difference. Moreover, the mechanisms being stressed by intelligence training should also be suited to bring about the desired change. Alas, most training—in a sense of over-learning rather simple tasks—just have people repeatedly completing different variations of the same type of question. Simply adjusting the difficulty of questions to say 50% is not an impressive improvement of the retesting-ad nauseamapproach. Studies with a more substantial dosage provide a much better read and a more realistic picture (Schmiedek, Lövdén, & Lindenberger, 2010).

Another field in which fostering intellectual functioning was studied is cognitive ageing. The useit-or-lose-it hypothesis (Hultsch, Hertzog, Small, & Dixon, 1999) suggests that being intellectually active prevents an age-associated cognitive decline. Obviously, it is difficult to collect strong data on cognitively-active lifestyles over decades and, thus unsurprisingly, there still seems to be no conclusive evidence (Salthouse, 2006). Given that intellectually engaging activities will hardly have adverse effects, living a mentally active life is not a bad choice. However, if you are hoping to maintain or improve your intelligence by skipping physical activity in exchange for intellectual activity—this is probably a bad idea in the long run, as physical exercise has been shown to be beneficial for intellectual functioning (Kramer & Colcombe, 2018).

#### 14.3 Conclusions

We want to use this section to point out a few pervasive problems in intelligence research, raise open questions, and hint to potential solutions for such problems. We began this chapter by highlighting that intelligence research is about individual differences and covariation, whereas most other chapters in this book are about the general psychology of cognition and experimental effects. There is some lamentation about the unfortunate nature of the barriers between these two disciplines (Cronbach, 1957). Indeed, the intelligence model we introduced as widely accepted has a substantial lack of cognitive sophistication. For example, despite its essential role in intelligence research, our understanding of most reasoning tasks is severely limited. Popular definitions of the construct often stress the novelty of reasoning tasks as an essential feature, yet we have no clear idea of what novelty actually means. Usually, these discussions move on by pointing to induction, deduction, and sometimes abduction—but rarely is there ever a connection between reasoning tasks used in intelligence research and the same tasks being used in experimental settings to study

competing theories about inductive thinking, for example. Taken together, the lamentation about these two disciplines of psychology remains justified.

In the end, gc can be considered as a collection of all sorts of pragmatic and knowledge-driven thinking. We have merely begun to understand the breadth of all the aspects we are subsuming here: wisdom, command of a language, foreign-language aptitude, declarative and procedural knowledge of all sorts etc.. Crystallized intelligence needs a lot more attention. And research on gc demands specific methods due to its intrinsic orientation towards change and its idiosyncrasy that grows over the course of one's life.

A closer look at the general learning and recognition factor provokes a few questions, too. The factors below glr mostly refer to specific methods for measuring memory. Of course, no one can claim that associative memory is a different memory store than free recall memory, for example, even though the

factor labels suggest so. Additionally, researchers are at a loss when it comes to choosing a glr test because the method selected heavily affects outcomes. A much stronger connection with experimental approaches is essential to further our understanding of this factor.

The discussion of potential shortcomings of the taxonomy we actually endorse seems endless. Should originality and creativity really be located below learning and retrieval? What about interpersonal abilities, such as emotional competence? Clearly, there is no shortage of questions and problems. It is therefore important to understand this taxonomy as a starting point rather than as an end result. There is much to be improved, but intelligence testing in all its varieties is also a major success story from an applied perspective. It is a strong predictor for several desirable outcomes and it is no doubt essential for determining how cognitively rich our lives are.

#### Summary


#### Review Questions


### Hot Topic: Sex Differences in Crystallized Intelligence?

Oliver Wilhelm

Few topics in ability research are regarded as controversial as sex/gender differences in cognitive abilities. According to the Gender Similarity Hypothesis (Hyde, 2005), sex differences in cognitive abilities are mainly small and unsystematic. This general conclusion is empirically supported for fluid intelligence but is challenged for crystallized intelligence, when measured with knowledge tests. Most studies show that males outperform females in general knowledge, with an average overall effect of *d* = .26 (Schroeders, Wilhelm, & Olaru, 2016). A closer examination on the level of domains reveals a more complex pattern: for example, females clearly outperform men in health-related domains, such as aging and nutrition, but large differences in favor of males were found for technology and the natural sciences (Ackerman, Bowen, Beier, & Kanfer, 2001). It is striking that such stereotypic sex-related differences in knowledge domains seem to match the sex differences in interest as reported in the famous

"Men and Things, Women and People" meta-analysis by Su and colleagues (2009). On the other hand, we should avoid overgeneralizing such differences. For example, the magnitude and direction of sex or gender differences in mathematical competencies varies dramatically across countries (Stoet & Geary, 2013).

Ulrich Schroeders (Photo: Sonja Rode/ Lichtfang.net)

An aspect that is often neglected in studies on group differences in cognitive abilities is the aspect of item sampling. The same way participants of a study are selected from a population (person sampling), items can be thought of as being drawn from a population of items (item sampling). In the construction and validation of psychological measures, we usually assume that we draw items from a theoretically infinite item universe. In a recent study, we put this idealistic assumption to the test (Schroeders et al., 2016). We used metaheuristic sampling procedures (i.e., ant-colony-optimization algorithms) to compile psychometrically sound short forms of a knowledge test. The algorithm was set for two criteria, a) to select items from an initial set that adhere to strict psychometric criteria concerning fit of the data to a model, and b) to deliberately tilt sex differences to either favor males or females. The results show that sex differences vary considerably depending on the indicators drawn from the item pool. In other words, we could compile knowledge tests for sciences and technology in which females outperformed males. They also could compile health tests in which males outperformed females. This result questions the generalizability of previously reported findings on sex differences in crystallized intelligence. On a more general stance, the results corroborate the notion of Loevinger (1965, p. 147) that the random sampling assumption of items (and tests) is unrealistic because test development is "almost invariably expert selection rather than sampling". Unfortunately, many studies concerning group differences in cognitive abilities fail to acknowledge item selection effects.

#### References


# References


tinction between intelligence and student achievement. *Educational Research Review*, *4*, 165–176. doi:10.1016/j.edurev.2009.04.002


*ican Psychologist*, *12*, 671-684. doi:10.1037/14156- 015



by mutualism. *Psychological Review*, *113*, 842–861. doi:10.1037/0033-295x.113.4.842


# Glossary


# Chapter 15

# Creativity: An Overview of the 7C's of Creative Thought

TODD LUBART<sup>1</sup> & BRANDEN THORNHILL-MILLER1,2

University Paris Descartes<sup>1</sup> & University of Oxford<sup>2</sup>

Creativity refers to original thinking that leads to new productions that have value in their social context (see Runco & Jaeger, 2012). Creative thinking can be distinguished from routine thinking, in which regular cognition yields run-of-the-mill, common ideas. Many human activities involve regular thinking; creativity comes into play when a new idea or a new solution is sought. The topic of creativity, as a fundamental aspect of human thinking, can be understood through a "7 C's" approach (Lubart, 2017). Just as the "Seven Seas" refer historically to all the major bodies of water on Earth, the 7 C's of creativity refer to all the main aspects of the topic helpful to mapping its territory: Creators (person-centered characteristics), Creating (the creative process), Collaborations (co-creating), Contexts (environmental conditions), Creations (the nature of creative work), Consumption (the adoption of creative products) and Curricula (the development and enhancement of creativity). In this chapter, the main concepts for each "C" will be surveyed and presented.

# 15.1 Creators: Person-Centered Characteristics

Creators refer to all those who engage in creative thinking. In fact, every human being can be characterized as a creator and as "creative" to some degree. We tend to think spontaneously of great, eminent cre-

ators such as Leonardo da Vinci, Marie Curie, Jane Austin, or Pablo Picasso. However, these eminent creators represent the pinnacle of a much larger set of creative people, who deploy their original thinking in their everyday lives and work (Kaufman & Beghetto, 2009).

Thus, professional or workplace creators refer to those who are creative, or "innovative" in their job context. Some jobs, such as visual artists, writers, designers, musical composers, or engineering inventors require creativity as a core part of the work. However, there is a much broader set of jobs in which creativity can be very important on a regular but more intermittent basis, as is the case for managers, lawyers, teachers, doctors and other healthcare workers. Finally, in still other jobs, creativity can sometimes be very useful, albeit on a sporadic basis, such as for pilots, accountants, and security agents. In all these cases, the professional environment recognizes the value of new ideas and aims, at least in theory, to promote their development and implementation.

Beyond professional settings, creativity can occur in daily-life situations, at home, with family or friends, or in leisure activities. Some people may invent a new recipe for family meals, even though they are not professional chefs. Others may have a new idea for a club activity or a novel solution to problems between friends, and some people may

find a way to fix a broken item in their home. All of these examples illustrate creativity in "everyday life" settings, usually with some recognition by other people in the immediate social environment.

Finally, creativity can be conceived at a strictly intra-personal level. Indeed, when people learn about new topics, they create cognitive structures that allow them to understand the topics; they generate concepts that are new to them, although possibly already very well known to others. This is a kind of creative thinking at the individual level, which perhaps serves the person him- or herself. It is reminiscent of Piaget's proposal that children act like little scientists, generating their own hypotheses and rediscovering concepts. It is also possible to view a person's life path and self-development as a creative act, event, or process. In this humanistic tradition, each person designs his or her life path and sculpts who he or she is, as an ongoing, lifelong creative work.

Needless to say, there are large individual differences in creativity. Some people produce more highly creative work than others in their professional setting, in their everyday life activities, or in their intrapsychic sphere. For example, in science, some creators propose groundbreaking contributions (such as Einstein), whereas others propose original ideas that gain some recognition in their specific scientific domain; many scientists work within existing paradigms, doing "normal" science, which may replicate or slightly extend existing findings (see Kuhn, 2012). There has been debate on the extent to which the same basic psychological "ingredients", such as mental flexibility and risk taking, underlie these diverse manifestations of creativity. Essentially, variations in the quantity and quality of each ingredient, as well as the specific combination of the multiple ingredients, can lead to the wide range of creativity observed across individuals, yielding sometimes the eminent, field- or culturechanging big "C" cases of creativity (Kaufman & Beghetto, 2009; Sternberg & Lubart, 1995). This is the basis for the multivariate approach, according to which multiple factors are necessary for creativity, and the interaction of these ingredients during the creative process leads to the wide range of creative achievement (see Amabile, 1996; Lubart, 1999).

More than a century of work has investigated the "ingredients" that play a role in creativity. In other words, are there some characteristics that creative people tend to share? From early studies of "creative imagination" to modern neuroscientific research on brain networks (Vartanian, Bristol & Kaufman, 2013), from case studies of great creators such as Sigmund Freud and Martha Graham (see Gardner, 1993), to correlational studies of cognitive and personality characteristics related to creative achievement (Batey & Furnham, 2006; Feist, 1998; Feist, Reiter-Palmon & Kaufman, 2017), to controlled experimental studies and neural imaging, a large number of person-related characteristics have been identified as relevant to creativity. The exact set of these characteristics varies to some extent with the domain of creative thinking (such as visual art, literary, social problem solving, etc.) and the specific task to be accomplished. The specific set of ingredients and the relative weights of these ingredients can be identified through a task analysis, and by comparing and contrasting people who achieve relatively more creative output compared to those who achieve less.

We will describe two main kinds of ingredients: abilities and traits. Creativity-relevant abilities refer to information-processing capacities that favor the encoding, comparison, and combination of information for purposes of original thinking (Sternberg & Davidson, 1995). Creativity-relevant traits refer to preferred ways of behaving (these traits are expressed through personality, thinking styles, or motivational patterns) that favor original thinking (see Sternberg & Lubart, 1995).

In Table 15.1, several abilities and traits that often have been found to be important for creativity are listed. This table presents a representative set of ingredients for creativity but is not exhaustive.

In Figure 15.1, the relationships between the ingredients indicated in Table 15.1 and other key concepts concerning creativity are illustrated. First, there are several ingredients–cognitive and noncognitive (conative or affective)–which are personcentered. Second, there are also ingredients that are environment-centered (these will be described in the section concerning the "C" of "Context"). These ingredients (person-centered and context-centered)

Table 15.1: Examples of person-centered ingredients for creativity.



provide the basis for a person's creative potential. Creative potential refers to the resources that a person can profitably invest in any given activity, such as writing a story or inventing a machine. The potential is latent and may not be put into play unless a person actively engages in a task. The ensuing

process, called "Creating", is a chain of events in which the ingredients are deployed and work thereby advances. This chain of events leads ultimately to a resulting production, a "Creation", which will be more or less original and valuable.

Figure 15.1: Multivariate approach to creativity.

It is important to note that a given person's ingredients can be seen as offering various degrees of creative potential, depending on the task or domain of work. For example, in Figure 15.2, a hypothetical "radar" profile of a person's ingredients is depicted together with the expected ingredients that are needed to be highly creative in task A and B; the individual depicted (i) has relatively more potential to be creative in task A compared with task B, because the required ingredients are somewhat different for each task and the individual's profile matches best the profile needed for task A. For task A, only some extra risk taking may be needed, whereas in task B, additional mental flexibility, knowledge, risk taking, idiosyncrasy, and intrinsic motivation will be required. This type of model shows how the partial domain specificity of creative ability can be understood. The correlations of people's performance across creativity tasks are positive, in general, but weak to moderate ranging often from .20 to .60 (Baer, 1993). The correlations observered between creative performance tasks reflect the fact that even when some ingredients are shared in common across

all tasks, some of them are weighted" differently in each tasks' own specific mix of ingredients.

To illustrate these person-centered ingredients for creativity, consider the following examples. Two "cognitive" ingredients and two "conative" ingredients will be described, although there are many others that play important roles as well.

First, the capacity to engage in flexible thinking can be highlighted. *Cognitive flexibility* refers to the ability to approach a topic from an alternative perspective compared to the standard view, it involves letting go of one idea in order to explore a different one. Cognitive flexibility is the capacity to sidestep thinking habits, to get out of a stereotyped way of seeing an issue or solving a problem; it is the opposite of rigid thinking, which characterizes a locked perspective, more likely to lead to being conceptually blocked in problem-solving. Habits are learned patterns that facilitate cognition, and often reduce the mental workload. However, habits also inhibit original thinking. In this regard, flexibility supports creativity.

#### Creators: Person-Centered Characteristics Lubart & Thornhill-Miller

Figure 15.2: Individual profile and two sample task profiles.

With respect to cognitive capacities, one issue that has been studied consistently for more than half a century is the relationship between creativity and intelligence. Guilford and Christensen(1973) noted that studies on intelligence tests and creativity (mainly through divergent-thinking tasks) showed weak positive correlations and the scatterplots often had a "triangular-shaped" distribution of data points, with few people who had low intelligence test scores showing moderate to high levels of creativity. Later, a meta-analysis of studies correlating intelligence and creativity showed an average correlation of .17 (Kim, 2005). Whereas there is no clear consensus concerning a threshold beyond which more intelligence does not matter, Karwowski et al. (2016) used necessary condition analysis—which tests for the systematic absence of a phenomenon (creativity) at certain levels of a variable (intelligence)—and found that low levels of intelligence are a limiting condition for the manifestation of creativity.

A second example of a characteristic that is important for creativity is *knowledge*. Knowledge refers to information that may be characterized by its depth or its breadth. Both facets of knowledge are important for creativity. In general, knowledge about a topic potentially allows a person to build on existing ideas, to avoid repeating what has been done in

the past, and to focus attention on what is new and valuable in a field. In this sense, *depth* of knowledge can facilitate creativity to some extent. However, too much of a good thing can be a problem. In fact, some research suggests that high levels of expertise can hinder creative thinking because experts get stuck in routine ways of approaching an issue, even when new ways may be more appropriate (Dror, 2011; Frensch & Sternberg, 1989; Simonton, 1984). Breadth of knowledge offers the opportunity to associate concepts that may not be habitually connected. Knowing about diverse topics may facilitate analogical or metaphorical thinking because one can apply concepts from a different domain to the topic or problem. Analyses of Charles Darwin's notebooks during his trip to the Galapagos Islands, when he proposed the theory of evolution, for example, clearly illustrate the ways in which his botanical knowledge served as a basis for thinking about the mechanisms at work in animal species (Gruber, 1981).

A third example can be drawn from the conative domain, which refers to the wish, intention and motivation to engage in an activity. The proclivity for *risk taking* refers to the tendency to engage in behaviors in which there is potential for gain or loss and the outcome is not completely predictable. For example, in a high-risk situation, the odds may be

low that a new approach to a problem could lead to a desired, valued solution. In this case, a person oriented toward risk taking may choose to invest his or her resources, energy, and time in this nascent idea. Despite the probability of failure, some people will go "against the odds" and pursue a new idea. Risk taking supports creativity, in general, because creativity by nature requires breaking away from what exists already, what is tried-and-true, what is known (and perhaps not optimal) but predictable. Research suggests that people's preferred levels of risk taking can vary from one domain of activity to another. For example, a person may be willing to take a risk in sports and attempt a new style in ice skating during a competition, but will not necessarily be willing to try a new style in a visual-arts task; another person may invest his or her energy in a new entrepreneurial business idea but not be at ease with proposing new ideas in a writing task. Therefore, it is useful to consider risk taking patterns by activity domains instead of referring to a general risk-taking trait. In the investment theory of creativity, Sternberg and Lubart (1995) highlight the importance of risk taking, which supports the engagement in the search for new ideas which break from tradition. Even if a person has the needed cognitive abilities, there may be no engagement with new ideas if the person fears failure.

A fourth and final example of an ingredient for creativity is idiosyncrasy or the tendency to experience the world in non-standard ways (Bierhoff & Bierhoff-Alfermann, 1973; Eysenck, 1995). Idiosyncrasy can be considered as a personality trait that may express itself in one's way of perceiving and acting in the world. One form of idiosyncrasy that has been extensively explored and shown to be related to creativity is known as "positive schizotypy", which is a tendency to have unusual cognitive, perceptual, or emotional experiences that is well distributed in the normal population (Claridge, 1997). Idiosyncrasy in several forms may apply in all facets of life. For example, in the emotional sphere, a person may experience non-standard emotions, or express their feelings in atypical ways. This could be termed "emotional idiosyncrasy." It is a potential source of personalized non-typical associations, or approaches to a situation, a topic, or a problem to

be solved (Averill, 1999). For example, people with unusual affects associated with a given topic can benefit from this idiosyncrasy by developing unusual associations or approaches that people experiencing "standard" emotions about the same topic would not. A poet can, for example, use this affective richness to provide a unique, fresh perspective when engaged in literary creation.

### 15.2 Creating: The Creative Process

The creative process refers to the sequence of thoughts and actions that characterizes the generative act, resulting in an original, valuable production (Lubart, 2001, Finke, Ward & Smith, 1992). This act has traditionally been decomposed in terms of stages, steps, or sub-processes (Sternberg, 2017). Early work based on introspective accounts of eminent creators and observational studies using thinkaloud protocols or analyses of traces of activity (such as creators' notebooks or drafts), suggested four main stages, traditionally labeled, preparation, incubation, illumination, and verification (Sadler-Smith, 2015). Preparation refers to the accumulation of background knowledge and active thinking that may span a relatively long period when a topic is engaged. Incubation notes a type of mental activity in which ideas may be associated, explored in the fringe of consciousness, or reworked in the "back of one's mind" (Sio & Ormerod, 2009). Illumination is the "eureka" moment when a promising, new idea appears. This may in some cases be called an insight and is marked in particular by the novel nature of the idea that emerges. Verification is usually considered a mode of thinking in which new ideas are tested and refined. Numerous authors have proposed and examined additional steps, sub-processes, or modes of thinking, including problem-finding, problem formulation, frustration, divergent thinking, association, idea resonance, benefiting from chance events, analysis, and synthesis (Mumford et al., 1991; Yokochi & Okada, 2005). All of these have enriched and expanded our understanding of the creative process.

Guilford (1950), in a classic presidential speech to the American Psychological Association, empha-

sized the topic of creativity and highlighted divergent thinking as a special part of the creative process. Divergent thinking characterizes an idea search conducted in multiple directions in order to obtain a large number of possibilities. In particular, "fluency" of a performance on a divergent-thinking task refers to the number of ideas generated, whereas flexibility refers to the diversity of the ideas generated. It has been shown that generating many different ideas is likely to enhance chances of generating an original idea; this is at least partly attributable to the nature of a typical sequence of ideas, which is characterized by more common ideas coming first and more idiosyncratic ones arriving later on in the sequence once the common, shared ideas have been exhausted. Guilford's (1985) work, including his contribution to the structure of intelligence model (SOI), provided attention to two other processes that play a major role in creative thinking. These are "evaluative" and "convergent" thinking. Evaluation refers to an analytic mode of thinking, in which strengths and weaknesses are assessed and then provide guidance for further action. Convergence refers to thinking that leads to a single answer. Convergent thinking has often been associated with getting the single "right" answer, but this meaning of convergence is relevant in run-of-the-mill cognitive tasks, which tend to yield relatively non-creative, standard ideas. Instead, consider the more general sense of convergence in which various elements are brought together to lead to a single response. This act of converging may be achieved through an integration and synthesis of disparate elements, or their transformation, and leads—in the case of creative thinking—to a new idea. Thus, Guilford's legacy leads us to describe a three-mode process involving divergent-exploratory thinking, evaluative thinking, and convergent-integrative thinking.

Based on Guilford's research as well as seminal work by Binet and Simon, in 1904, and other pioneers, creativity tests such as the Torrance Tests of Creative Thinking and Wallach and Kogan's Creative Thinking measures were developed to assess the degree to which people can successfully engage the creative process (see Glaveanu, 2019; Torrance, 1974; Wallach & Kogan, 1965). In these batteries of creativity tests, people are essentially asked to

generate many different original ideas using verbal or image-based stimuli. There are, for example, tasks that require thinking of ways to use a common object, drawing tasks in which a basic geometric form needs to be used in each different drawing, and title-generation tasks based on a picture that is provided. The number of ideas (called "fluency"), flexibility, and originality of ideas are often scored. Other measures, such as the Test of Creative Thinking – Drawing Production (Urban, 2005), or the Remote Associate Test (Mednick, 1962), involve several elements (graphic, or verbal) that the individual must find a way to synthesize and combine to express an original idea. In these later cases, the production of one synthetic idea is required rather than the production of many different ideas.

Based on these process-oriented measures of creative thinking, Lubart, Besançon, and Barbot (2011) proposed the Evaluation of Potential Creativity (EPoC). This test battery is organized by domain of creation (visual art, literary-verbal, social, mathematical, scientific, music, and body movement). In each domain, there are two types of tasks: divergentexploratory thinking to generate as many original ideas as possible, and convergent-integrative thinking that involves generating one elaborated production that takes into account the elements provided. As illustrated in the graphic-artistic domain, one task is to generate as many sketches as possible in a limited time using a graphic form or image that is provided. In Figure 15.3, a child produced 10 drawings using the banana shape. Using norms for children of the same age, it can be noted that this is a relatively large number of ideas, slightly more than the average child. In Figure 15.4, several children's drawings from the convergent-integrative task are illustrated. In this particular task, photos of eight objects are presented and the children made a single elaborated drawing that integrated at least four objects. The extent to which the drawing integrates the objects, the number of objects used, and the originality of the drawing are assessed. In the first drawing illustration (drawing 4A), the child has arranged the objects in a typical fishing scene, whereas in drawing example 4B there is a greater integration of objects, which form a single "rabbit" composed of a valise, light bulbs for feet, and carrots for ears.

Figure 15.3: Responses to a divergent-exploratory task in the EPoC battery.

Finally, in example drawing 4C, a highly original idea of a "Samurai" warrior (as named by the child) uses all eight objects, integrated in unusual ways, with the sword formed by a carrot and a wooden manikin's body, the warrior's head being made of the fish, and the arm made of a shovel. The creativity of the integrative drawing is assessed by judges who examine the number of objects used and the originality of the resulting drawing production.

Emotions are an integral part of the creative process. Engaging in creative productive work may allow individuals to express their emotions, or alternatively may lead people to experience emotions resulting from their creative thinking process. A large number of studies have examined the impact of positive and negative mood states, and emotional arousal on the creative process (Baas, de Dreu & Nijstad, 2008). There are mixed results, but one of the main findings is enhanced divergent-thinking productivity in the presence of a positive mood state, perhaps due to more relaxed evaluative criteria for deciding that an idea is worthy of some attention (Davis, 2009).

Part of understanding the natural creative process involves recognition of the diversity with which it can unfold. The creative process varies from individual to individual, but also across tasks and within the different domains. Thus, the creative process in the visual arts is not necessarily the same as the

creative process in engineering or musical composition. Within these domains, the creative process of sculpting is not necessarily the same as the process of painting. Additionally, each creator may engage in his or her own personalized sequence, and bring the ingredients to bear at different moments during the creative act. Recent work has sought to compare and contrast the creative process across domains (Lubart, 2018). For example, using an actiontheory approach focusing on the impetus, activity engaged, materials used, and social connections involved, Glaveanu and colleagues (2013) observered differences and similarities across descriptions of the creative process based on interviews with visual artists, writers, scientists, designers and music composers.

In addition, it is possible to contrast the process traces of individuals who show relatively high levels of creativity in their productions in a given task, with those who show relatively low levels of creativity in the same task (Lubart, 2018). The results of this type of study show that contrasting sequences of specific activities (such as idea evaluation, association, taking a break from work, etc.) characterize the more successful creators in comparison to less successful ones. For example, in a study of fine-arts students in a sculpture task, those who were judged to be highly creative showed different process traces (based on a self-report diary), when compared with those who

Figure 15.4: Children's responses to a convergent-integrative task in the EPoC battery (4A: fishing scene, 4B: Rabbit, 4C: Warrior). ©2011. Editions Hogrefe France. Reproduced by permission from Hogrefe France.

were not very creative: after defining the problem those who were more creative in the end tended to seek information whereas those who were less creative tended to start their sculpture right away. In addition, when returning from a break, students who reengaged the sculpture by associating new ideas with their project tended to be more creative in the end than those who reengaged their sculpture work by critiquing what they had accomplished up to that point. In other process tracing work, Pringle and Sowden (2017) examined the creative process in a garden-design task and found that tightly linked shifts between associative and analytic processing modes were characteristic of the most creative work. In general, it is increasingly recognized that the creative process is a dynamic flow that offers nearly unlimited opportunities for individual differences (Beghetto & Corazza, 2019).

Some work has, additionally, focused on methods that formally structure the process of creating, in order to help creators enhance the originality of the resulting productions. Thus, a large literature exists on creative thinking methods designed to guide the creative process through brainstorming (divergent thinking-based procedure), lateral thinking (flexibility-based techniques), creative problem solving methods (strategies sequencing and integrating divergent and convergent thinking techniques),

TRIZ (Russian acronym for the "Theory of Inventive Problem Solving", based on analyses of inventors' methods), and design thinking (user-oriented techniques), just to mention some of the most developed methods (Brown, 2008; De Bono, 2010; Osborn, 1953; Puccio & Cabra, 2009). The term "creative thinking method" is used here to describe a structured-process approach that may be composed of several steps and may deploy several specific thinking techniques within the global method. For example, *creative problem solving* is a formalized method composed of several steps, such as exploring the challenge (problem finding and formulating), generating solutions, and generating an action plan for solution implementation. Within each step, which can occur in dynamic sequences, several techniques can be employed. One example is a problemexploration technique in which an initial problem statement is proposed and then each word is expanded to become a list of synonyms. Based on the alternative words, the problem space can be explored and perhaps a new problem formulation will offer original opportunities and approaches for idea generation. For example, given an initial problem statement, "How can we raise sales of toys in our store?", several alternative words could be listed for "sales"(profits, client satisfaction), "toys" (games, hobby items), and "store" (internet site, shopping

mall outlet). Based on the alternate words, a new problem formulation could be: "How can we raise client satisfaction of game items in our shopping mall outlet?". This problem may lead to very different solutions than the initial one, because divergent exploratory thinking applied in the problem formulation phase opens up the range of options. As John Dewey noted, a problem well stated is half solved.

In general, it is also important to note that the creative process is a meaningful endeavor, which assumes that it is, and should be, to some extent goal-driven and purposeful. The meaning and goal of creating may of course be defined at a strictly personal level (intrapsychic), or at a social level, as in productions generated for one's familial or professional setting. Thus, special cases in which an agent engages in random acts with no goal or recognition of seeking a creative production (such as a human or non-human typing random keys that yield a "text") will not typically be considered part of authentic "creating", even though a production that has some interest may eventually result from this random activity.

#### 15.3 Collaboration: Co-Creation

Collaboration refers to the process through which two or more people, often with different or complementary skills, engage in shared creation, frequently producing something that they could not or would not produce on their own. From the science of Marie and Pierre Currie to the cubism of Pablo Picasso and Georges Braque and the music of the Beatles, the history of great cultural contributions demonstrates that much creative genius results from collaboration—from the extraordinarily important and enhancing effects of support, differing and complementary skills and dispositions, and even the competition that dyads and groups provide (see Clydesdale, 2006; John-Steiner, 2006). Today, thinkers from many different fields believe that the future of human work will be both more creativityfocused and more collaborative in nature. A study of almost 20 million research papers and 2 million patents over 45 years, for example, showed the number of coauthors had almost doubled during that

time, and also that multi-authored papers were more likely to be cited in the future (Wuchty, Jones, & Uzzi, 2007). The lone creative genius may still appear in some fields, but given the effects of globalization, increasing technological complexity, and the concomitant specialization of expertise, in many areas of endeavor, collaboration is becoming more of a necessity.

From another perspective, however, one can also clearly argue that *all creativity* is, and always has been—at least implicitly—collaborative. Every work of art or scientific discovery, for example, is based on shared, pre-existing foundations of culture and language, as well as the ideas and methods borrowed from more immediate disciplinary predecessors. Some creativity is simply more easily recognized and labeled as "collaborative" because of its proximity in time or space to the others that helped make it happen. Einstein's discoveries, no matter how single-handed and revolutionary they might seem, are impossible without the history of science before him. And, as commonly observed, no single individual knows how to make a new pen, automobile, or the majority of common cultural objects in their entirety because the materials and knowledge are coming from everywhere.

Thus, the enterprise of understanding creativity should not, in fact, be confined to intra-individual psychological investigations, but must instead also be pursued social psychologically or sociologically at inter-personal and systemic levels. Such multileveled approaches to creativity were relatively uncommon until recently, but they do have some good foundations in the field. For example, Csikszentmihalyi (1988) proposed a "systems model", and helpfully asked not "what is creativity" but "*where* is creativity?" The answer, as Figure 15.5 suggests, is that "creativity"—whatever one decides it is—is found in the triangular inter-relationship between the *individual* talent, the parameters of the particular creative *domain* in which a person works, and the *field* of experts that help define and identify the other two components. Another good starting point within psychology can be found in Vygotsky's sociocultural developmental approach (John-Steiner & Mahn, 1996), which (seeing human cognition as

Figure 15.5: Csikszentmihalyi's (1988) system's view of creativity.

developing through social dialogue) also offers the possibility of a multilevel approach to creativity.

Psychologists could learn a great deal from entirely sociological work, such as Farrell's (2001) description of the life cycles of "collaborative circles" of people who participate in the co-creation of a movement in art, literature, science or other fields. Gleaned from close study of groups like Sigmund Freud's early followers, and the famous Oxford "Inklings", which included J.R.R. Tolkien and C.S. Lewis amongst its ranks, Farrell shows how the group dynamics that accompany and generate creativity often seem to pass through seven stages: 1) group formation; 2) rebellion against authority; 3) questing and the development of new visions; 4) creative work (a stage when ideas are refined, often in direct dialogue and collaboration); 5) collective action, when larger projects are taken on; 6) separation, when differences cause disintegration of the group; and 7) nostalgic reunion. Working in similar directions and developing some of his own tools, psychologist Keith Sawyer's notion of "collaborative emergence" aims to supplement individual level explanations with appropriately collective ones for more ephemeral or entirely collaborative creativity like jazz and improvisational theater (see Sawyer, 2010, 2017).

Most research on creative collaboration can be categorized further into two types: 1) small, laboratorybased "group studies", usually with no more than two to four members—often students—who are temporarily assigned to a group and observed under carefully controlled conditions, and 2) "team studies" of groups that are embedded in organizations and whose members are, therefore, in longer-term, less artificially arranged relationships and whose size and structure vary, as decided by supervisors for practical reasons, rather than being scientifically structured for experimental purposes. Although laboratory groups and organizational teams appear to engage in collaborative processes that can be described similarly (Mullen, Driskell, & Salas, 1998), most of the research on task performance and group creativity consists of lab group studies, whose weakness is their distance from the real-world contexts and relationships. With team studies, on the other hand, it can be very difficult to determine if results are caused by differences in group composition or by the processes in which they engage (Paulus, Dzindolet, & Kohn, 2012).

The actual goal of collaboration can be seen somewhat differently in different settings. In smallgroup research, the target is usually "creativity"; with the short life of these groups focused on ideagenerating stages of the process. Team research in organizational settings, in contrast to small-group research, more often claims "innovation" as its target. In this regard, the distinction often made

(but not always finding support) is that innovation as a concept is larger or more encompassing than creativity, innovation including an emphasis on successful implementation following initial, ideageneration.

Whereas some theorists are less accepting of the creativity/innovation difference, in practice, organizations tend to make the distinction, with CEOs, for example, generally seeing three types of innovation as shaping their goals at work:


Leadership has become inextricably linked to creativity through collaboration and their common, fundamental focus on problem-solving and organizational and social change (Puccio, Mance, & Murdock, 2010). The recent rise of the more empathy- and collaboratively-centered approaches to creativity, such as design thinking and even "design leadership", further underscore this important relationship (Thornhill-Miller & Muratovski, 2016).

As we have argued, creativity is often collaborative and distributed. Economic history suggests it is, in fact, *collective* creativity and intelligence the swift trade of ideas possible with a critical mass of population density and division of labor through specialized occupations—that has helped make humanity the planet-shaping force that it is (Ridley, 2010). The internet economy, virtual teams, online distributed problem-solving, and other forms of "crowdsourcing" creativity are all now established enough to become subjects of study (Gippel, 2018). Further applications and the rise of future technologies of collaboration seem poised to magnify the processes that already exist and are likely to be revolutionary in additional ways.

# 15.4 Contexts: Environmental Conditions

The creative context is comprised of both physical and social spheres. It can be described as a multilayered environment in which a person's local family, school, and work contexts are nested in their larger geographical, regional, national, and international contexts. There is a large literature on the impact of context on creativity (see Harrington, 2011). For example, children in a classroom with stimulating posters on the wall compared with children in a classroom without posters tend to produce a greater number of ideas, and more original ideas on a divergent-thinking task (see Beghetto & Kaufman, 2017). Some companies have a creative space, with colorful walls or furniture, white boards, and some play spaces featuring a basketball hoop or table football. Research has examined features of workplace environments, such as the presence of windows, a view of nature, wall color, odors, noise levels, temperature, light levels, the presence of green plants, and office organization in open space. All of these environmental features can impact creativity although the ideal conditions vary to some extent across the samples studied. The environment provides the affordances that set the stage for creativity to be able to occur; for example, if an individual has access to musical instruments and role models, this access offers a greater opportunity for musical creation compared to a person with more limited access.

Dul (2019), in a survey of these studies, suggested that environments can support creativity in three fundamental ways, by providing,


#### Contexts: Environmental Conditions Lubart & Thornhill-Miller

Figure 15.6: Conditions from a study of virtual environments (6A: Real meeting room; 6B: Virtual meeting room; 6C: Virtual artist's house, see Guegan, Nelson, & Lubart, 2018). Credits: J. Guegan & J. Nelson.

(c) a socio-emotional context that supports idea generation (such as a positive ambiance supported by "happy" colors and music).

A recent series of studies, looked at the effects of various environments using a virtual reality paradigm. Working within Linden Lab's *Second Life*, an online multi-user virtual environment, we created several workspaces, which were designed to represent a neutral meeting room and a supportive artist's studio' with many objects and attributes that previous research showed participants associate with a positive, creative space. These workspaces are illustrated in Figure 15.6. Students in preliminary studies described features of creative workspaces and these were then designed in the virtual world. New participants were assigned randomly to one of the rooms in this experimental study (in which they worked via their avatar), and a "real-life" control condition with a real meeting room was also included (in which participants worked being physi-

cally present, termed "first life"). Using a standard divergent-thinking task to find unusual uses for a common object, we observed that students assigned to the "artist's house" produced significantly more ideas than those in the virtual meeting room and the real meeting room (Guegan, Nelson & Lubart, 2017). These latter conditions did not differ significantly between each other. In addition to fluency, the originality of ideas showed the same pattern, favoring significantly the artist's house condition. Thus, this study demonstrated the direct effect of the physical environment on creative output.

In another line of work, numerous studies focusing on organizational environments examined the social-contextual features related to creative workplace behavior. In most studies, respondents described their workplace by questionnaire and reported on their creative accomplishments. Based on the meta-analysis by Hunter, Bedell & Mumford (2007), there is clear evidence for the importance of


Case studies in diverse fields, such as businesses inventing new products, provided further evidence for these findings. The invention of Post-Its® at 3M, for example, was facilitated by the presence of support for risk taking and trying new ideas (time and budget resources made explicitly available for such projects), support for idea development with internal competitions for new ideas and idea champions (who are resource people to help inventors move their project forward), and top management goals for the company to generate a large percentage of its future revenues from products that remain to be invented.

Beyond the workplace, research has investigated a wide range of contexts from the family environment to macrosocial units such as cities, nations and international settings (Harrington, 2011). With respect to the family context, many important variables have been identified, including an enriched home environment with stimulating activities, access to cultural activities, role models of creative people (who may be a child's own parents), a flexible parenting style that provides structure but also liberty, and support for a child's expression of their originality, and perhaps idiosyncratic and imaginative interests. All of these factors are supportive of later creative development and accomplishment, according to biographical studies of eminent creators. However, some studies also point out that distress, trauma, stress and adversity that is also present in the family environment, may lead to resilience and character-building, which also serves to support later creative accomplishment (Kohanyi, 2011). There is therefore some evidence that family environments favoring creative development are complex, with some

positive features supporting creativity (epitomized by Carl Roger's theory of parents who provide psychological safety and freedom) and perhaps some negative conditions or hardships which help develop perseverance, motivation and other traits that are important for creativity (see Kohanyi, 2011).

Historically, there are numerous examples of cultural spaces, like Florence in the Renaissance, late 16th century London, and early 20th century Paris, which illustrate the effects of a fertile setting for creative activity. These "creative cities" are typically located near other cultural centers, and offer the opportunity for multicultural experiences, which have also been positively linked to creativity. Creative cities provide a critical mass of people interested in cultural events and financial support for creative work which, in turn, attracts the creative class of artists, writers, designers, scientists and others in creative fields (Florida, 2005).

Research on cultural variations and creativity indicate that nuances of the definition of creativity, domains in which creative work is valued, and the extent to which creative work is encouraged are all subject to variation. Some cultures value the production that provides evidence of creative thinking whereas others focus relatively more on the creative act itself. In some cultures, creativity is more an individual act, whereas in others it is inherently more collective. Some cultures express a strong need for certainty or respect of tradition, which may place less value on risky, culturally novel endeavors (see Lubart, Glaveanu, De Vries, Camargo & Storme, 2019). According to the sociocultural approach, creativity is embedded as a phenomenon in a cultural time and space. It is inconceivable to separate creative thought from the cultural matrix that supports it and ultimately is shaped by it (Glaveanu et al., 2019).

# 15.5 Creations: The Nature of Creative Work

The creative process results, in general, in a new state (*outcome* state) that is more or less different from the starting state (*initial* state). This new state may range from being slightly different to being

radically different from the initial state. In general, the new outcome state will be substantiated by a production—a "creation"—that was not present initially. For example, an artist may start with a blank canvas and he or she paints and transforms it into a painting. A writer may start with a blank page and a pen and end with a poem written on the page. These creations are "traces" indicating that a process was engaged. The creation, or production, may be tangible (such as a sculpture) or intangible (such as an idea). The extent to which the resulting creation is deemed to be original and valuable, however, is what will determine the creativity of the work. Not all creations are original or valuable. For example, a perfect copy of a famous painting is a creation; it may be valuable and appreciated by viewers for the technical skill that was required, but it is not original. To take another example, a very original sequence of words, generated perhaps by choosing words at random pages from a dictionary, that makes no sense to readers or the author him or herself, is a textual creation: a sequence of words. However, because it has no meaning, it is not considered creative. Thus, productions which are strange or bizarre and original but without value are not considered creative work.

The creative nature of a production can be determined by appreciating the originality and value of the work. In the first instance, creativity can be assessed by the creator, but ultimately, in most cases, this evaluation is made socially: there is a peer or expert review of the production, which situates the work with respect to other existing work. Thus, most creative work exists in a social setting, is destined to exist in a social context, and the evaluation is made by informed others. This social conception of creativity was formalized by Amabile (1996) in the "consensual assessment technique". In this measurement approach, qualified judges evaluate independently a set of productions on a rating scale using their own criteria for creativity, and then the average judgment is calculated for each production. In most cases, the judges need to be knowledgeable in the domain to be assessed. Some studies have examined the criteria that judges use and the variability in these criteria across judges. In general, the most important criteria are originality (or novelty) and

the value of the work. Some authors have proposed creative product rating scales that help structure the judgment process by using a set of detailed descriptors. For example, Besemer and O'Quin (1986) have a rating scale in which descriptors concerning novelty, surprise, utility, authenticity, and other characteristics can be attributed to a product to code its degree of creativity. Studies of ratings on creativity as a global score, related to variability of diverse aspects of productions to be judged, show how judges may weigh more or less strongly the diverse criteria, and integrate the information about these criteria in various ways.

In general, the dual criteria of originality and value may have particular nuances in each domain of activity. For example, in engineering, the value may be the utility of an invention to solve an existing technical problem with a minimum of resources, whereas in visual arts, value may be framed in terms of the positive aesthetic experience or feeling of surprise or connection that the work produces in viewers. In addition, the relative importance of originality and value may differ in these two exemplary fields, engineering and visual art. Perhaps, for some, creativity judgments in the visual arts depend mainly on originality and secondarily on aesthetic value of the work, whereas in engineering, these two main criteria have equal importance.

The criterion of originality deserves special attention. It is possible to code originality in a statistical way, in terms of the prevalence with which an idea is produced in a given sample of people. Thus, when asked to list unusual uses for a box, a person may say it can be used to store things. This idea is quite common and not at all original. In contrast, the response that the box can be burned to provide a source of heat is quite rare, and statistically infrequent. It is "original" because it is rare or has a low frequency in a statistical sense. This statistical coding can provide support for evaluating the creativity of productions, but it has several limitations, including the significant burdens of requiring a comparison sample and the counting of the frequencies of all responses given for the task, as well as the fact that the value dimension is not taken into account.

A creation is a reflection of an individual's creative ability and the environmental context that con-

tributed to or supported the expression of this ability. It is possible that the judged creativity of a production (through social consensus of judges or by the creator him or herself) does not reflect the "true" originality or value of the work. In this case, the judges may be biased and inaccurate estimators of the originality or value, because they may lack contextual knowledge of the field to ground their evaluation. Alternatively, the judgments of a work at the moment of the creative act do not reflect the potential value of the production in the future. Corazza (2016) suggested that the potential of a creation should also be considered when evaluating it. This potential can be linked to a work's generative potential, what it may become in the future. This issue suggests that the creation is always contextdependent. A creation may also continue to evolve in terms of its value once it encounters the social world. For example, Nietzsche's literary work was not particularly appreciated when he wrote it, but much later was evaluated by literary critics as very creative. Furthermore, as previously discussed, there are several kinds of creative contributions that range from advancing ideas within a paradigm to reorienting work in a new direction (Sternberg, Kaufman, & Pretz, 2002)

The originality and value of a creation is appreciated with respect to a culturally meaningful reference group. Some cultures especially value contributions that break with tradition, whereas other cultures value creations that work within traditions but renew or extend them. Some cultures value creative work in specific fields like science and technology more than others, such as the arts or humanities. Thus, just as originality is defined with reference to a comparison group, the value of creative contributions is also socio-culturally defined. For example, creative productions that contribute positively to societal development are generally valued across societies, but malevolent creativity, such as novel criminal activity, is not necessarily recognized as a creative production in every context due to the negative impact it has on society (Cropley, Cropley, Runco & Kaufman, 2010). This is, however, a subject of debate and related to cross-cultural variation in the conception and domains in which creativity is valued.

# 15.6 Consumption: The Adoption of Creative Products

Creative productions are embedded in a social context, and may ultimately be adopted by it, becoming an accepted or important part of a particular culture or context. In the case of creativity in professional contexts, this is in principle one of the goals of the creative act. The "C" of consumption highlights the link between creativity and innovation. For many authors, an innovation refers to creativity in its applied context of consumption, with a focus on new products or services.

At a macro-economic level, the consumption of creative goods or services has been recognized as one of the main sources of long-term sustained economic growth since the industrial revolution (Lubart & Getz, 2011). Indeed, the creation of new products, new services, or more generally, new ideas that have some market value lead to opportunities to increase the diversity or quality of goods and services. Sometimes the introduction of new goods eliminates the value of previously existing goods, which Schumpeter (1942) called "creative destruction". For example, the creation of automobiles has essentially eliminated the need for horse-pulled buggies. In general, novel productions or services that meet a need will attract attention and create economic growth. Thus, creativity is recognized by the Organization for Economic Cooperation and Development (OECD) as a crucial part of economic activity. In the educational domain, creativity is considered a 21st-century skill and the World Economic Forum lists creativity as a key capacity for employability in the next decade (World Economic Forum, 2016).

At the microeconomic level, some consumers are attracted to creative goods for their inherently stimulating value. They offer an unknown and a discoveryoriented experience, which the consumers value. To the extent that people seek these creative goods and services, the market will value these creative goods and potential creators will be attracted to invest their mental and financial resources in the production of more new ideas. Thus, the consumption of creativity fosters more creativity.

Some members of the public are more ready than others to adopt new ideas, new products, or new processes. The characteristics of lead users, or early adopters of creative goods, are somewhat similar to those who create themselves; they tend to be open minded, curious, and sometimes they are themselves creative individuals. Furthermore, it is possible to consider that when people consume creative goods, they may contribute themselves to inventing unexpected uses of the product. In some cases, consumers are directly involved in the product design process. This co-design, or user-based participatory design, illustrates how the public can be associated directly with the creative process.

Another way in which consumers express their creativity is through the customization of products. Customization enhances the utility of a product, thanks to the creative act of the consumer. This customization can range from a small act of individual expression, such as decorating one's computer with decals that reflect personal interests, to modifying a piece of standard furniture or painting a motorcycle in a special way. An example of largescale consumer participation in the creative process of product development is the invention of new SMS acronyms or abbreviations by telephone users that enhanced the value of SMS messages for communication by leading to a linguistic corpus of new shared terms that are particularly useful.

#### 15.7 Curricula: Developing Creativity

The term "Curricula" focuses on the development, education, or enhancement of creativity. This topic is the subject of growing interest at all levels of the educational system: primary, secondary, postsecondary, and continuing adult training. Here we can summarize several lines of work to provide a broad overview.

First, there are pedagogies that seek to stimulate creative thinking in a global way. These pedagogies have been most often used at the elementary and secondary-school levels. Two examples are Maria Montessori's or Celestin Freinet's approaches. These pedagogies can be considered active learning methods because the child thinks in inventive

ways by engaging in activities to discover concepts. In these pedagogies, domain-situated content (such as creating a school newspaper) is produced in the course of project activities in the classroom. Thus, these active pedagogies serve as a form of creativity training, by engaging pupils in creative activities and results comparing these types of pedagogies to more passive learning approaches suggest benefits for developing creativity (see Besançon & Lubart, 2016).

A number of studies have examined how school grades are related to creative thinking. A metaanalysis by Gadja, Karwowski, and Beghetto (2017) showed the there was, in general a positive but weak correlation, suggesting that school performance was slightly related to creativity, which may be due to factors such as general motivation and knowledge of particular disciplines being important for both creativity and school achievement. Other research on characteristics that are important for creativity, such as risk taking and failure tolerance, suggest that school reward systems focusing on good grades for getting the "right" answer, may actually diminish risk taking behavior over the long term (Clifford, 1988). The impact of school environments on the development of creativity is a complex topic that is increasingly drawing attention (see Beghetto & Sriraman, 2017).

Second, there are training programs or activity modules which can foster creativity. These programs tend to focus either


In the first kind of learning programs, knowledge and expertise on creativity can be taught in order to raise awareness. For example, it is possible to explain the concept of creativity to children or adults, which will demystify it and facilitate the adoption of a view of creativity as an ability that can be develTable 15.2: Example Creativity Enhancing Strategies & Techniques.

#### Brainstorming-like techniques


#### Perspective- & Frame-changing Techniques

A broad family of techniques with different subtypes all aiming to change the frame of reference in which a topic or problem is considered. Deformation techniques produce new ideas by changing or distorting the topic or reality in some systematic way, for example by removing part of it, looking at it backwards, magnifying it, or making it smaller, seeking serendipitous input, etc.. Projective techniques involve using the imagination to place oneself in another mental perspective or another person's emotional situation. These include role-playing games, empathy- or imagination-based projective profiling techniques, and other "detour" techniques to radically shift one's point of view and processes of considering a problem. *"Lateral thinking"*: Used as a generic term can refer to a large group of procedures helping to approach a problem from a new angle. For example, "deforming" the problem through exaggeration or minimization, reversing the order involved, deleting elements, or inverting the goal (i.e., if the goal is improving a product or a process, instead exploring all the ways to make it worse, as means of pursuing insights to make it better). De Bono (2010) *"Disney method"*: a process for creative generation attributed to film pioneered Walt Disney, according to which one produces ideas by taking on different roles and the thinking styles of the dreamer, the realist, and finally, the critic or spoiler in successive steps. Dilts (1994) *"Daydream"* ("Rêve éveillé"): A technique fostering a "detour" in perspective that involves pretending to enter into the world of dreams and imitating them in various ways, thus creating distance from reality and facilitating the emergence of new ideas. Aznar (2005) *Continued on next page*

oped. Another form of training provides examples of more and less creative productions so that people have a knowledge base against which they can compare their own ideas or judge other people's ideas

(Storme et al.,2014). This knowledge about the criteria for creativity allows a person to be a better judge of their own ideas. Additionally, creativity can be taught through role modeling of creative beTable 15.2: Example Creativity Enhancing Strategies & Techniques. Continued from previous page.

#### Associative & analogic techniques

A group of techniques focused on making connections between the problem or topic of interest and other topics, ideas, or objects. The target for association can be unspecified and left open for individuals to freely find any and all relationships (in a manner more similar to brainstorming techniques). Or the targets can be "forced" on a particular topic, often requiring more remote associations and leading to more analogic thinking (in a manner similar to perspective- and frame-changing techniques). *"Mindmapping"*: A drawing-based method of escaping linear thinking and generating new ideas by drawing the central concept in its web of associations with other issues, characteristics, and ideas. Buzan & Buzan (1996) *"Bisociation"*: A technique, and fundamental creative process, whereby two objects, frames of reference or systems of relationships that are usually separate, are combined or applied to each other allowing something new to emerge. Word puns or Edison's combining the once separate ideas of "electricity" and "light" to invent the light bulb, are good examples. Koestler (1964)

haviors demonstrated by the teacher, or case studies of creative people who can be sources of inspiration (see Starko, 2014; Kelly, 2016). Finally, some training programs, such as sequences of exercises to stimulate divergent thinking, have been developed (see Isaksen & Treffinger, 1985; Mansfield, Busse & Krepelka, 1978). These training sequences focus, in most cases, on practicing divergent thinking or insight problem solving and mental flexibility. Ma (2006) conducted a meta-analysis of the impact of these creativity training programs and found an average effect size of being able to boost creative thinking skills by a half a standard deviation after participation in a multi-week training program.

In terms of programs that teach specific creativity techniques, these are often geared to adults in workplace contexts. The long history of idea-generating strategies and creative problem-solving techniques provides substantial support for the "trainability" of creativity on the individual and group levels (Nickerson, 1999; Scott, Leritz, & Mumford, 2004). From Osborn's contributions to creative problem solving and the idea of "brainstorming" (Osborn, 1953) and Gordon's (1961) synectics (an analogybased creativity technique), to Buzan and Buzan's (1993) mindmapping (a visual representation technique) and more recent work on design thinking (e.g., Brown, 2008; Darbellay, Moody, & Lubart, 2017), a wide range of strategies and techniques have gained popularity due to their perceived practical value in applied situations. Although there is substantial overlap and they can be classified in different ways, a brief taxonomy of some important strategies and techniques might include at least three general categories which are presented in Table 15.2: brainstorming-like techniques, associative and analogic techniques, and perspective or frame-changing techniques (see Thornhill-Miller & Dupont, 2016; Debois et al., 2015, for more detailed taxonomies and further explanations).

The neurophysiological enhancement of creativity has recently become another prominent topic in the creativity-training literature. There are many competing neurobiological theories of creativity, ranging from hemisphere-dominance theories (see Mihov et al., 2010, for a review) and more specific regional specialization theories (e.g., Flaherty, 2005) to general neurological connectivity theories (e.g., Thalbourne, Houran, Alias, & Brugger, 2001) and lines of research now focusing on the brain activity at the moment of insight (Kounios & Beeman, 2014). The neuroscience of creativity is providing a growing understanding of the brain areas involved in creative thinking (e.g. Abraham, 2013; Arden et al., 2010; Beaty et al., 2016; Dietrich & Kanso, 2010; GonenYaacovi et al., 2013; Jauk et al., 2013; Jung et al., 2010; Vartanian, Bristol & Kaufman, 2013).

Martindale conducted an important series of experiments demonstrating that low cortical arousal was associated with superior performance on creative-thinking tasks, and creative individuals showed more variability in arousal especially during moments of creative inspiration (Martindale, 1978; 1999). He observed a clear decrease in levels of cortical arousal (as measured by alpha waves) among highly creative study participants as they shifted from analytic thinking to convergent creative thinking (on the Remote Associates Test) to divergent thinking (using the Alternate Uses Test). Similar results with different tasks and also suggesting differential recruitment of the parietal and frontal cortex of high versus low creatives have also appeared more recently (Jauk, Benedek, & Neubauer, 2012). Particular patterns of cortical arousal could be important to induce the different kinds of cognitive activation observed in successful execution of each stage of the creative process.

More directly important, however, is a strand of research on non-invasive brain stimulation (e.g., transcranial direct current stimulation techniques, tDCS, and transcranial alternating current stimulation, tACS). Transcranial stimulation of brain areas involves passing a weak electrical current between two poles over the scalp that modulates the excitability of neural tissue in the region, either increasing or decreasing it depending upon the polarity. Of particular interest, a small group of studies showed that tDCS and related techniques can enhance creative thinking and problem-solving ability. In one particularly dramatic example, Chi and Snyder (2012)

found that 40% of their study participants who received tDCS over their anterior temporal lobes (in order to shift them toward right-hemispheric dominance) were able to solve a difficult insight problem (the "9 dot problem") that none of the unstimulated participants in their study solved. Cerruti and Schlaung (2009) were able to use tDCS to enhance convergent creative thinking using the the Remote Associates Test. And Goel et al. (2015) have now also shown that it can be used to differentially modulate convergent/insight problem thinking and divergent thinking (see Zmigrod et al., 2015).

One major challenge that methods of brain stimulation must overcome to make even larger contributions to the enhancement of creativity (or the understanding of any complex state) is, of course, the difficulty of identifying the entire complex pattern of scattered activations involved in a particular mental state (e.g., the moment just before insight) or over time (e.g., during the different stages of the problem-solving process). Here the brain "connectome" approach—a wiring diagram or mapping of neural connections in the brain to study the structure of networks—is promising (Sporns, 2014, Deco et al., 2018 ). Much like biofeedback, neurofeedback based on EEG oscillations (alpha / beta) can be used to enhance cognition through mental training. Recently the causal role of beta oscillations on divergent thinking performance was highlighted in some seminal research showing that training self-control over brain activities specifically related to creative thinking could be particularly effective in producing a significant increase in individual creative potential (Agnoli, Zanon, Mastria, Avenanti, Corazza, 2018).

#### Summary


#### Review Questions


#### Hot Topic: Navigating the Future of Creativity

#### Complexity of measurement, Connected Constructs and Computer Technology

Todd Lubart

Readers might find it surprising that after almost a century of concerted empirical effort, the measurement of creativity actually remains a challenge in research and applied settings. Following the "multivariate approach" (discussed in section 15.1 and also illustrated by Table 15.1 and Figure 15.2), the authors have been developing the "Creative Profiler", a multidimensional psychometric tool that gathers together research-validated measures of the full range of cognitive, conative, socio-emotional, and environmental resources that the literature suggests contribute to creative potential and performances of all kinds.

The Creative Profiler aims to enhance our understanding of creativity in general by offering "high resolution" mappings of the different resources that actually contribute to more or less creative performance in different professions (e.g., among designers, managers, lawyers, clinicians or teach-

ers), in different domains (e.g. visual arts vs scientific research), or on different specific tasks (e.g. writing a poem vs writing a story). More information about the components, methods, and kinds of groups we are seeking to profile and train can be found on the Creativity and Innovation Profiling Project's website, CreativityProfiling.org.

Creativity's complexity and cultural embeddedness also links it to a constellation of other "hot topics" in psychology and society—such as leadership, intelligence, design, culture, and spirituality many of which have also proven challenging to operationalize in research.

Branden Thornhill-Miller

Creativity's long association with "madness" in the popular imagination, for example, has now been scientifically redefined in a manner that suggests some of this creativity might be linked, instead, with group-enhancing and culture-shaping individual differences in the tendency to experience more wonder and/or to have more unusual emotional or mystical experiences (see Thornhill-Miller, 2007; 2014). In any event, the status of creativity as a universal human capacity and its close association with other quintessentially human activities—from art and spirituality, to language and scientific invention—has led both of us to reflect more deeply on the central role that creativity seems to play in the fundamental question of what it means to be human. Branden coined the terms "*Homo mirans*" (the "wondering ape") and "*Homo syntheticus*" (the concept-synthesizing creature that lives

more and more in a world of its own idiosyncratic and synthetic making) to address these definitively human phenomena (Thornhill-Miller, 2007; 2014). Todd Lubart has placed the entirety of the creative process squarely at the center of human identity, in his work by adopting the epithet "*Homo creativus*" (Lubart, Mouchiroud, Tordjman & Zenasni, 2015).

Looking forward towards humanity's creative future—computers and computational technologies offer an exciting new range of possibilities for both research and creativity enhancement, from artificial intelligence, brain-computer interfaces, and whole-brain emulation, to technologies of distributed creativity and direct brain stimulation—some of which we have already discussed. For both of us, however, our work in this area has focused more specifically on the ready accessibility of virtual reality technologies. Our research suggests virtual worlds offer great promise for exploring and expanding our understanding of human creativity (see Burkhardt & Lubart, 2010), and as a means of optimizing traditionally available creativity training and enhancement options (Thornhill-Miller & Dupont, 2016). As current reality now surpasses much of the science fiction of the recent past, it is only a matter of time before our creative capacities will again exceed our imaginations.

#### References


#### References


Advancing Creativity Theory and Research: A Sociocultural Manifesto. *The Journal of Creative Behavior*. doi:10.1002/jocb.395


program. *The Journal of Creative Behavior*, *47*(1), 3–21. doi:10.1002/jocb.20



ing mechanisms in creativity training for non-expert judges. *Learning and Individual Differences*, *32*, 19– 25. doi:10.1016/j.lindif.2014.03.002


# Glossary


# Chapter 16

# Wisdom

#### JUDITH GLÜCK

University of Klagenfurt

*"Wisdom is not a product of schooling but of the lifelong attempt to acquire it."* (Albert Einstein)

Most people would probably like to develop wisdom in the course of their lives. However, few people actually become very wise—advice-givers that many turn to, exemplars in the way they live their own life. What is wisdom, how can we study it from a psychological perspective, and why is it so rare? For a long time, psychologists did not consider wisdom as something that could actually be measured and studied using our empirical research methods. Only since the 1980s has wisdom become a topic of psychological research. This chapter first describes how wisdom has been defined by psychologists. Then, it discusses how wisdom can be measured, how it develops, and how it can be fostered by psychological interventions.

#### 16.1 What is Wisdom?

When psychologists first took up wisdom as a topic of empirical research in the 1970s and 1980s, they were not quite certain how this complex and somewhat vague concept could be defined at all. Rather than define wisdom based on theoretical considerations, several researchers decided to start by studying how so-called laypeople—people who had no specific knowledge of the subject—defined wisdom.

## 16.1.1 People's Conceptions of Wisdom

Studies of what people mean when they talk about wisdom typically start by asking participants to write down all characteristics that they associate with wisdom and wise persons (e.g., Clayton & Birren, 1980; Holliday & Chandler, 1986; Sternberg, 1985; overview in Weststrate, Bluck, & Glück, 2019). Then, researchers go through the lists that participants generated and put together a "master list" that includes all aspects that have been mentioned. New samples of participants are then asked to rate each aspect for how central or typical it is for wisdom. As it turns out, there is considerable agreement between people about the most important characteristics of wisdom. Typically, researchers use statistical methods like factor analysis to group the individual attributes into broader dimensions. A classical study by Clayton and Birren (1980) identified three such dimensions: an affective dimension (including the adjectives peaceful, understanding, empathetic, and gentle), a reflective dimension (introspective, intuitive), and a cognitive dimension (knowledgeable, experienced, pragmatic-observant, intelligent). Other studies have found similar components. These studies show that while wisdom involves knowledge and thinking, it also includes non-cognitive aspects such as empathy, intuition, and self-reflection. In other words, wisdom integrates capacities that are usually studied in different fields of psychology, such as cognition, emotion, and motivation.

Other research looked at how people describe a concrete wise person: whom do they consider as wise and why? When people are asked to name an exemplar of wisdom, certain names come up again and again, for example, Mahatma Gandhi, Jesus Christ, Martin Luther King, or Mother Teresa (Paulhus, Wehr, Harms, & Strasser, 2002; Weststrate, Ferrari, & Ardelt, 2016). What do these people have in common? While political figures such as Abraham Lincoln and philosophers such as Socrates are also often mentioned (Weststrate et al., 2016), it seems that the most typical as wisdom exemplars dedicated their lives to a great cause that involved the well-being of many—they changed the world by peaceful means. Thus, in addition to the cognitive, reflective, and affective characteristics that people associate with wisdom, there is also an ethical or moral aspect to it: wisdom is applying one's capacities for a greater good than just one's own well-being (Sternberg, 2019).

# 16.1.2 Psychological Definitions of Wisdom

The next step in psychological wisdom research was to develop more theory-based definitions of what wisdom is. Different researchers have based their accounts of wisdom on different theoretical backgrounds, incorporating people's conceptions of wisdom, philosophical and theological conceptions, and psychological research on related capacities. For example, the first definition of wisdom that became the foundation of a large-scale research program was based on studies of expert knowledge, an important topic of cognitive psychology in the 1980s (e.g., Ericsson, Krampe, & Tesch-Römer, 1993; see Chapter 13, "Expertise").

#### 16.1.2.1 Wisdom as Expertise: The Berlin Wisdom Model

Generally, expertise is knowledge acquired through long-term experience and practice in a particular domain—much expertise research has looked at, for example, how chess experts differ from chess novices in how they mentally represent and solve chess problems. In the 1980s, Paul Baltes and his

Glück Wisdom

co-workers at the Max Planck Institute for Human Development in Berlin, Germany, argued that wisdom is a special form of expertise: expert knowledge about the fundamental issues of human life (Baltes & Smith, 1990; Baltes & Staudinger, 2000). Some people are fascinated by the difficult questions of our existence: how can we live knowing that we are going to die? How can we balance autonomy and intimacy in our relationships? How can we solve difficult moral dilemmas? While many people do not care a lot about these questions, some are deeply motivated to gain a better understanding of them by observing other people's lives, reading philosophical and psychological literature, and, perhaps most importantly, contemplating their own experiences and trying to learn from them (Ardelt, 2003; Glück & Bluck, 2013). Such people are likely to become experts as they go through life—they accumulate knowledge, experience, and ways of thinking that are well-suited for solving problems and giving advice to others. Importantly, according to Baltes and colleagues, the knowledge that wise people acquire is not only about how problems can best be solved but also about variability and uncertainty: wise individuals know that people can have very different values and priorities, that worldviews and behaviors are shaped by people's life situations and broader life contexts, and more generally, that most things in life are uncertain—that unexpected events can happen at any time and we can only predict the future to a very limited extent. All these insights have taught wise people to be cautious when they suggest problem solutions or give advice. In other words, a wise person is unlikely to just tell somebody what to do in a difficult situation: he or she will listen to the advice-seeker's account carefully, try to take different perspectives on the problem, and suggest more than one possible approach.

#### 16.1.2.2 Wisdom as a Personality Constellation: The Three-Dimensional Wisdom Model

While the Berlin wisdom model considers wisdom-related knowledge—knowledge about facts and strategies, but also about variability and uncertainty—as the key component of wisdom, Monika Ardelt has argued that wisdom really is a personality characteristic (Ardelt, 2003; Ardelt, Pridgen, & Nutter-Pridgen, 2019). Based on the findings by Clayton and Birren described earlier and on theoretical considerations, she argues that wise individuals have a specific personality structure which combines three dimensions: a cognitive dimension that consists of the deep desire to understand life; a reflective dimension defined as a general willingness to take different perspectives and to reflect upon oneself and one's behavior; and an affective dimension characterized by compassionate love for others. Ardelt certainly agrees with Baltes and colleagues that wise people have a lot of knowledge about life, but she believes that the personality dimensions are what enables people both to acquire that knowledge and to apply it to real-life problems. While the Berlin model assumes that wisdom can be learned from observing other people such as wise mentors, Ardelt has argued that wisdom is not gained by reading books or observing other people's lives: she believes that wisdom comes from personal, internalized insights that develop as people experience and navigate difficult challenges in their own lives (Ardelt, 2004. 2005). Such challenges, according to Ardelt, can change a person and make him or her wiser. Thus, while Baltes and colleagues assume that wisdom is a body of knowledge that can exist outside individuals—for example, in books or proverbs (Baltes & Kunzmann, 2004)—, Ardelt says that wisdom is inextricably connected to an individual's personal life story.

#### 16.1.2.3 Other Definitions of Wisdom

The Berlin Wisdom Model and the Three-Dimensional Wisdom Model are probably the two most-studied conceptions of wisdom. They are also typical examples of two types of definitions in wisdom literature: some definitions focus on aspects of wisdom-related knowledge and wise thinking (e.g., Grossmann, 2017; Sternberg, 1998, 2019), while others emphasize non-cognitive, attitudinal aspects of wisdom such as self-transcendence or humor (Levenson, Jennings, Aldwin, & Shiraishi, 2005; Webster, 2007). Table 16.1 gives an overview of

psychological wisdom definitions that can be found in literature.

At first sight, the definitions shown in Table 16.1 may seem to be about different constructs. However, few of them are incompatible with one another. As mentioned earlier, wisdom is a complex, multifaceted construct that integrates facets of knowledge and thinking, personality, and motivation. One important aspect that most wisdom definitions have in common, although not all of them make it explicit, is an orientation at a greater good than just one's own benefit. The common-good orientation of wisdom is most visible in Robert J. Sternberg's balance theory of wisdom (Sternberg, 1998, 2019). Essentially, Sternberg says that wisdom is practical intelligence that is utilized to balance different interests in a difficult situation so as to maximize a common good, rather than the benefit of any particular party.

In sum, wisdom has been defined in many different ways, but the definitions share some common characteristics. Typical elements of wisdom definitions include:


#### 16.2 How Can Wisdom Be Measured?

One reason why psychologists consider it important to have precise definitions of wisdom is that such definitions are necessary for developing methods to measure wisdom. Only if we have valid measures of wisdom, can we study how wisdom manifests itself and how it develops (Glück, 2018; Glück et al., 2013). The Berlin Wisdom Model and the Three-Dimensional Wisdom Model are not just prototypes for different definitions of wisdom; they are also good examples of two traditions in the measurement

of wisdom: one focusing on wisdom-related knowledge and thinking (overview in Kunzmann, 2019)

and one focusing on wise personality characteristics (overview in Webster, 2019).

Table 16.1: Some definitions of wisdom (adapted from Glück, 2015).


# 16.2.1 The Berlin Wisdom Paradigm and Other Measures of Wise Thinking

To measure wisdom as expert knowledge, Baltes and colleagues developed the Berlin Wisdom Paradigm (BWP). Participants are presented with brief descriptions of difficult life problems, such as "A fifteen-year-old girl wants to move out of her family home immediately." or "Someone gets a phone call from a good friend. The friend says that he cannot go on anymore and has decided to commit suicide." (e.g., Glück & Baltes, 2006; Staudinger & Baltes, 1996). They are asked to think aloud about what one could consider and do in such a situation. Participants talk about the problem for as long as they want; their responses are recorded, and transcribed. The response transcripts are then evaluated by trained raters with respect to the five criteria shown in Table 16.2.

A total of ten independent raters—two for each of the five criteria—are trained to rate the response transcripts on seven-point scales that range from "very little similarity" to "very high similarity" to an ideally wise response. The average across the ten ratings is then used as a participant's wisdom score. The Berlin Wisdom Paradigm is a reliable method, i.e., the two raters per criterion usually show good agreement and the ten ratings are sufficiently interrelated to form a meaningful score (Glück et al., 2013). Validity studies have shown that people who score highly in the BWP have more life experience than other people and are more intelligent and creative, more open to new experiences, and more oriented toward personal growth and supporting others (Kunzmann & Baltes, 2003; Staudinger, Lopez, & Baltes, 1997). Thus, even though the BWP mea-


Table 16.2: The five criteria for wisdom used in the Berlin Wisdom Paradigm.

sures wisdom-related knowledge, this knowledge is associated with non-cognitive variables relevant to wisdom.

More recently, Igor Grossmann built upon the BWP to develop a method for measuring *wise reasoning* (Grossmann, Na, Varnum, Park, Kitayama, & Nisbett, 2010; Oakes, Brienza, Elnakouri, & Grossmann, 2019). Grossmann and colleagues define wise reasoning as "the use of certain types of pragmatic reasoning to navigate important challenges of social life" (Grossmann et al., 2010, p 7246). Wise reasoning is characterized by dialectical thinking and intellectual humility as manifested, for example, in taking different perspectives, recognizing the limitations of knowledge, making flexible predictions, and searching for compromise. To measure wisdom, Grossmann and colleagues developed vignettes that describe difficult real-life societal or interpersonal problems, such as political conflicts in foreign countries or letters written to a newspaper columnist. Participants are presented with these vignettes and asked to write or talk about how these situations may unfold and why. As in the BWP, trained raters evaluate the transcripts with respect to criteria for wise reasoning. People who show high levels of wise reasoning have been found to be agreeable, nondepressed, and satisfied with their lives (Grossmann, Na, Varnum, Kitayama, & Nisbett, 2013).

Another measure of wisdom-related knowledge focuses on personal or self-related wisdom. As Ursula M. Staudinger has argued (Mickler & Staudinger, 2008; Staudinger, 2019; Staudinger, Dörner, & Mickler, 2005), some people are quite wise when they are thinking about someone else's problems, but have great difficulty applying their wisdom when it comes to themselves and their own problems. According to Staudinger, "general wisdom" is wisdom about life in general as it concerns other people, whereas "personal wisdom" is wisdom about oneself and one's own life. Measures like the BWP assess people's general wisdom. To measure personal wisdom, Mickler and Staudinger (2008) developed the *Bremen wisdom paradigm* (BrWP). In the BrWP, participants are interviewed about themselves as a friend—their typical behaviors, strengths and weaknesses, how they deal with difficult situations in friendships, and the reasons they see for

their own behavior. Participants' responses are rated for criteria that are somewhat parallel to those of the BWP, but apply to wisdom about oneself: selfknowledge (knowledge about one's strengths and weaknesses, priorities, and life meaning), heuristics of growth and self-regulation (knowing how to deal with challenges and grow from them), interrelating the self (seeing oneself in the context of one's social relations and life situation), self-relativism (being self-reflective and self-critical, but also having a healthy amount of self-esteem), and tolerance of ambiguity (recognizing and managing uncertainty and uncontrollability). As in the BWP, two raters per criterion rate each transcript, and their average is used as the wisdom score. People with high scores in the BrWP are intelligent, open to new experiences, and mature (Mickler & Staudinger, 2008).

The Berlin wisdom paradigm, Grossmann's measure of wise reasoning, and the Bremen wisdom paradigm all measure wisdom as a competence: a way of thinking about life challenges that is based on knowledge, intelligence, and ways of thinking that reflect an awareness of variability, uncertainty, and the limitations of one's knowledge. In all three approaches, people produce open-ended responses, which are then rated with respect to certain criteria as to what makes a response wise. Researchers who define wisdom as a matter of personality or attitude take a different approach to measuring it.

# 16.2.2 The Three-dimensional Wisdom Scale and Other Measures of Non-Cognitive Aspects of Wisdom

To measure wisdom according to her threedimensional model of wisdom as a personality characteristic, Monika Ardelt used the typical way psychologists assess personality: self-report scales. The *Three-Dimensional Wisdom Scale* (3D-WS, Ardelt, 2003) consists of 39 statements that reflect one of Ardelt's three dimensions of wisdom. Participants indicate the extent to which they agree to each of these items on five-point scales. For example, "Sometimes I feel a real compassion for everyone" is an item for the affective dimension. Many items in

the 3D-WS are reverse-coded. For example, "Things often go wrong for me by no fault of my own" measures the reflective dimension, but wise persons are expected to disagree with this statement, as they would always be aware of their own role in things that go wrong. "Ignorance is bliss" is a reversecoded item for the cognitive dimension, as a wise person is assumed to always want to understand things in depth. People's responses to the items are summed up to form separate scores for the three dimensions, and these three scores are then averaged into a wisdom score. People who score high in the 3D-WS have been found to have a strong sense of mastery and purpose in life, to be forgiving of others, not very afraid of death, and generally happy (Ardelt, 2003, 2011).

Jeffrey Dean Webster (2003, 2007) developed the *Self-Assessed Wisdom Scale* (SAWS), which defines wisdom as the willingness and ability to learn from life experiences and to utilize one's insights about life "to facilitate the optimal development of self and others" (Webster, 2007, p.164). The SAWS consists of 40 items that measure five components of wisdom. Critical life experience (having experienced difficult life challenges, e.g., "I have had to make many important life decisions") is considered as a prerequisite to developing wisdom. Reminiscence and reflectiveness (e.g., "I often think about my personal past") enables people to reflect upon and learn from their experiences and use them to deal with new challenges. Three personal characteristics help people to reflect upon experiences and grow wiser from them: openness (to perspectives, ideas, and inner experiences, e.g., "I'm very curious about other religious and/or philosophical belief systems"), emotional regulation (being able to perceive and regulate complex feelings, e.g., "I can regulate my emotions when the situation calls for it"), and humor (recognizing ironies and being able to laugh about oneself, which helps reduce stress and bond with others, e.g., "I can chuckle at personal embarrassments"). People with high SAWS scores are also high in ego integrity, generativity, forgiveness, and well-being, and they consider personal growth and supporting others as important values in their life (Webster, 2003, 2007, 2010).

Michael R. Levenson and colleagues defined wisdom as self-transcendence (Levenson et al., 2005). Drawing on conceptions from Buddhism, philosophy, and identity development in old age, they argued that wise individuals have acquired in-depth knowledge about themselves, understood that external things like money, success, or fame are not really essential to who a person is, and integrated and accepted the different aspects of their selves. These insights lead them to be at peace with themselves and to become self-transcendent—to care less about themselves and more about others and to feel deeply united with humanity, nature, and the world at large. The Adult Self-Transcendence Inventory (ASTI; Levenson et al., 2005; see also Koller, Levenson, & Glück, 2017) is a 34-item scale that measures self-transcendence and its predecessors, selfknowledge, non-attachment, and integration, using items like "My peace of mind is not easily upset" and "I feel that my individual life is part of a greater whole." People scoring high in the ASTI are open to new experiences, extraverted, non-neurotic and mature, and they often have experience with meditation and related practices.

Finally, the Brief Wisdom Screening Scale (BWSS; Glück et al., 2013) is not based on any specific theory of wisdom. It was developed based on a statistical analysis of data from a study that involved the 3D-WS, SAWS, and ASTI. The researchers used factor analysis to identify a common core across those three wisdom measures and then identified those 21 items from the three scales that were statistically most closely related to this common factor. In other words, the 21 items of the BWSS are closely related to one another and to what is common across the three wisdom self-report scales described earlier.

Other self-report wisdom scales include the Foundational Value Scale (Jason, Reichler, King, Madsen, Camacho, & Marchese, 2001) and the Wisdom Development Scale (Brown & Greene, 2006; Greene & Brown, 2009).

# 16.2.3 How Can Wisdom Best Be Measured?

The two approaches to measuring wisdom—openended measures and self-report scales—both have

some advantages, but also disadvantages. Selfreport scales are easy to administer. Study participants check their responses to each item, and researchers just need to sum up or average the responses into a wisdom score. The problem with selfreport scales, however, is that people's responses reflect who they think they are, which may not necessarily be who they really are. As an example, consider the item "I am good at identifying subtle emotions within myself." Wise people, being highly self-reflective, probably know how difficult it can be to disentangle the complex and ambivalent feelings they have in challenging situations. Therefore, they would probably partially, but not fully agree to this item. On the other hand, not-so-wise people may not even notice the complexity of their more subtle feelings in such situations, and might therefore happily select "fully agree." The more general problem is that humility and self-questioning are part of the wise personality, but a self-questioning person might be unlikely to describe him- or herself in a very positive way in a self-report scale. Thus, those people who receive the highest scores may not be the wisest ones, but the ones that are most certain of being "wise." In addition, of course, it is quite easy to intentionally "fake" wisdom in a self-report scale. If you want to try this out, fill out the ten items from the Brief Wisdom Screening Scale in Table 16.3

Glück Wisdom

twice—once as you would describe yourself, and once as you think a very wise person would. Thus, self-report measures of wisdom should always be taken with a grain of salt because they are susceptible to both socially desirable responding and to self-deception.

Open-ended measures do not have this problem: unless you know the criteria by which your response gets evaluated, it is a lot more difficult to produce a wise response to a vignette from the Berlin Wisdom Paradigm than to score high in a self-report scale. However, one problem remains: it may still be easier to talk wisely about what should be done in a theoretical situation involving a suicidal friend or a difficult teenager than to actually act wisely in such a situation in real life. Real-life wisdom requires not just wise thinking but also emotional strength and balance, self-reflection, and compassion—qualities that the BWP does not measure and that cannot really be inferred from a person's verbal response to a theoretical problem. Some researchers have tried to measure wisdom in ways that are closer to real life for example, by presenting participants with videos of real people discussing a conflict (Thomas & Kunzmann, 2013) or by asking participants about actual difficult challenges from their own lives (Brienza, Kung, Santos, Bobocel, & Grossmann, 2018; Glück, Bluck, & Weststrate, in press). A practical disadvan-

Table 16.3: Ten items from the Brief Wisdom Screening Scale. Check how much you agree to each item (1 = disagree completely, 5 = agree completely), then add up the numbers to compute your "wisdom score."


tage of open-ended measures compared with selfreport scales is that they require far more effort from both participants and researchers—participants are interviewed individually, responses have to be transcribed, raters have to be trained and paid. For this reason, most studies of wisdom used self-report scales, but more and more researchers try to incorporate at least one open-ended measure to ensure that their results are consistent across methods (e.g., Webster, Weststrate, Ferrari, Munroe, & Pierce, 2018; Weststrate & Glück, 2017a).

In sum, it is still an open question as to how wisdom can best be measured. While aspects of wise thinking should be assessed using open-ended measures, self-report scales may be the only possibility we have to access certain non-cognitive aspects, such as a person's feelings. An optimal measure of wisdom may need to integrate both approaches.

# 16.3 Is Wisdom a Stable Personal Characteristic—Or Are We All Wise Sometimes?

Most people think of wisdom as a quality of a small number of very special people. However, recent research shows that wisdom varies quite considerably by situation (Grossmann, 2017). Most of us have probably done a few very wise things in our lives—and a few very unwise things as well. For example, Glück, Bluck, Baron, and McAdams (2005) interviewed people about situations where they thought they had done something wise. Almost all participants were able to name at least one situation—making a difficult life decision, dealing with an unexpected emergency, learning to deal with a long-term problem—that they had handled wisely. Why were they able to do the wise thing in those situations, even if they weren't particularly wise people in general?—How wisely we act in real life depends not just on our wisdom-related knowledge and personality but also on whether we are able to utilize our knowledge and the relevant facets of our personality in a particular situation. For example, experiments have shown that people give wiser responses when they are instructed to use certain thinking strategies. Staudinger and Baltes

(1996) found that people responded more wisely to the BWP problem about the suicidal friend after spending ten minutes in an imaginary conversation about the problem with a friend. Interestingly, people also scored higher if they *actually* discussed the problem with a friend—but only if they had a few minutes to think about the discussion before responding. Thus, considering someone else's perspective on a problem may help us to act more wisely in a given situation. Similarly, Kross and Grossmann (2012) showed that so-called "selfdistancing" interventions improved people's wise reasoning. For example, Americans reasoned more wisely about the possible outcomes of U.S. elections if they tried to think about the elections from an Icelander's perspective than if they considered how the election outcome would affect their own lives. Grossmann and Kross (2014) showed that people reasoned more wisely about a relationship problem if they imagined the problem happening to a friend than if they imagined it happening to themselves. In fact, people even reasoned more wisely if they thought about a problem in the third person ("he/she") than if they were thinking in the first person ("I/me")!

Together, these findings suggest that people are wiser when they are able to mentally distance themselves from a problem and try to take various different perspectives on it than if they immerse themselves in it, taking a self-centered perspective. How well people can do this in real life, outside psychological experiments, certainly depends on what kind of person they are, but it also depends on the situation. If we are very angry or scared, for example, it is a lot more difficult to take someone else's perspective or even to think clearly about the best way to proceed.

Together, these findings show that wisdom is not just a matter of wise persons but also of situations. When we are able to take a step back and look at the broader picture, take the perspective of others, and acknowledge and regulate our feelings before reacting to a challenge, our wisdom has a far better chance to manifest itself. This brings up the question of how we can create situational contexts that foster wisdom. Wouldn't it be good if we could identify ways to make political, economic, or medical decisions wiser? As discussed earlier, Sternberg's balance theory of wisdom (Sternberg, 1998, 2019) states that a wise solution to a complex problem balances all the different interests involved, so that a common good is achieved. To be able to do that, it is necessary to be aware of all relevant interests and perspectives. Surowiecki (2005) has shown that groups can act more wisely than individuals if their members represent different perspectives and different areas of knowledge about the problem, and if all these different voices are heard and respected. It would seem to be possible to change the conditions under which, for example, political decisions are made so that such a culture can develop. Groups can, however, also make very bad decisions, especially if their leaders are unwise, i.e., foolish, and the group is structured in a highly hierarchical way.

Sternberg (2005; see also Sternberg & Glück, 2019) identified five fallacies that cause people in leading positions to make foolish decisions: *unrealistic optimism* (thinking one is so smart that everything one undertakes will end well, even if it looks to others like a bad idea); *egocentrism* (considering one's own needs and desires as the only thing that's really important); *false omniscience* (believing one knows everything and doesn't need to listen to others), *false omnipotence* (grossly overestimating one's control over things and therefore setting far too high goals), and *false invulnerability* (believing that one will not get caught or will not be hurt by the outcomes of one's decisions). These fallacies are clearly the opposite of wisdom, which is characterized, as described earlier, by a clear awareness of the limitations of one's knowledge and power, a willingness to take different perspectives, and a strong concern for the well-being of others. Unfortunately, power structures in many large organizations, including governments and large companies, tend to reinforce these fallacies: few people will speak up against their leader if it is likely to cost them their jobs. One of the most important applications of wisdom psychology to real life may be to develop ways to introduce wisdom-fostering structures into organizations.

# 16.4 Where Does Wisdom Come From?

In a world that is faced with difficult challenges climate change, global inequality, mass migration, political polarization, failing educational systems, and so on—, it seems very important to identify ways to increase wisdom. Broadly, there are two approaches to studying this question. First, some research has looked at how wisdom develops naturally over the course of people's lives. Second, studies have investigated how wisdom can be fostered through interventions—for example, by including teaching for wisdom in school and university curricula.

#### 16.4.1 The Development of Wisdom

How does wisdom develop, and why is it such a relatively rare phenomenon? Is it true that wisdom comes with age? And if it isn't, why do some people still become wiser over the course of their lives?

#### 16.4.1.1 Wisdom and age

When people are asked to name the wisest person they know, they usually come up with an older person (Weststrate et al., 2019). It makes a lot of sense to assume that wisdom comes with age: after all, wisdom is based on life experience, and life experience obviously accumulates over time. Older people have "seen it all", and they are in a phase of life where it may be easier to look back and see what is really important in life when one is no longer struggling to build one's own life. At the same time, few people agree that wisdom generally comes with age (Glück & Bluck, 2011)—we all know some older people who are anything but wise. How do these two notions fit together? Most wisdom researchers believe that many very wise people are, indeed, in the second half of life, but there are few of those very wise people in total (Jeste, Ardelt, Blazer, Kraemer, Vaillant, & Meeks, 2010). Most older people are quite happy and well-adjusted, but few are very wise.

There are a number of studies that looked at the relationship between wisdom and age in the general population. Virtually all of this research is crosssectional—that is, people of different ages were compared with respect to their levels of wisdom. These studies have produced surprisingly inconsistent results (Glück, 2019)—in fact, their results seem to be highly dependent on which measure of wisdom was used. For the BWP, a strong increase in wisdom has been found between the ages of about 15 and 25 (Pasupathi, Staudinger, & Baltes, 2001), but after that, wisdom-related knowledge seems to neither increase or decrease with age (Staudinger, 1999), although there may be a small decline in very old age. Scores in the 3D-WS actually are a bit lower in older age groups, mostly because older people have lower scores in the cognitive dimension of wisdom (Ardelt, 2003; Glück et al., 2013). Many older adults tend to think in less complex ways than young and middle-aged people do.

Recent research has found that wisdom as measured by the 3D-WS is highest in middle and late middle adulthood (Ardelt, Pridgen, & Nutter-Pridgen, 2018). The same pattern has also been found for the SAWS (Webster, Westerhof, & Bohlmeijer, 2014), whereas no relationship with age has been found for the ASTI (Glück et al., 2013; Levenson et al., 2005). Together, these findings would suggest that wisdom peaks in late middle adulthood, that is, in people's 50s and early 60s. However, Grossmann et al. (2010) found a linear positive relationship of wise reasoning with age well into participants' nineties, and Brienza, Kung, Santos, Bobocel, and Grossmann (2018) actually found a U-shaped relationship—that is, the lowest scores in middle age—for the SWIS. In sum, wisdom increases, stays stable, increases then decreases, decreases then increases, or just decreases with age, depending on which measure of wisdom is considered.

The most likely explanation for these inconsistencies is that the different measures emphasize different aspects of wisdom. As mentioned earlier, wisdom is a complex construct that includes several different components (Glück, 2019). Some of these components decrease with age in the general population—for example, openness to experience or the ability and willingness to think in very complex ways. Measures that focus on these components tend to produce lower scores in old age. Other components actually increase with age—for example,

compassion and concern for others or a willingness to make compromises and accept one's limitations. Measures emphasizing these aspects tend to produce higher scores in old age. It is important to also keep in mind that findings from cross-sectional studies are affected by so-called cohort effects: the people we compare in such a study differ not only in age, but also in the experiences they have had over their lifetime. The middle-aged and late middle-aged people who show high wisdom scores in current research were born in the 1950s and 1960s, that is, they came of age in the 1960s and 1970s, a period of time in which wisdom-related qualities may have been valued more highly than was the case for older and, perhaps, also for younger generations. For all these reasons, we do not have really conclusive evidence on the general relationship between wisdom and age yet. To understand how wisdom develops, it may be more important to look at individual developmental pathways over people's life courses. Longitudinal studies, which follow the same people over extended periods of their lives, have the potential to show us not just how age cohorts differ in wisdom but how individual life experiences shape a person's wisdom over time. For now, we have relatively little such evidence, but we have some theories about the development of wisdom that shed light on important factors.

# 16.4.2 Theories of How Wisdom Develops

As described earlier, Paul Baltes and colleagues (Baltes & Smith, 1990; Baltes & Staudinger, 2000) argued that wisdom is expert knowledge about the fundamental pragmatics of human life. The fundamental pragmatics of life are the "big issues" of human existence such as how we should live with the knowledge that we are going to die, how we can balance intimacy and autonomy in our relationships, or the complex moral dilemmas of our modern times. "Expert knowledge" (see Chapter 13, "Expertise") refers to an extraordinary amount of knowledge about a subject domain that is acquired through long-term, intense, goal-oriented practice. Baltes and Smith (1990) discussed in detail how wisdom might develop. They distinguished three types of factors that facilitate the development of wisdom:


According to the Berlin group, people's pathways to wisdom are very different depending on their unique life stories and life experiences. The MORE Life Experience Model (Glück & Bluck, 2013) specifies the role of life experiences in more detail. Its main assumption is that life challenges – experiences that deeply change people's beliefs about themselves or the world—are the main catalysts of the development of wisdom. Such challenges are often negative, such as a serious illness or a difficult conflict, but they can also be positive. For example, many people say that having their first child completely changed their priorities and needs. According to the MORE Life Experience Model, such experiences may not only change people's worldviews but also show them how much worldviews are shaped by experiences in general. For example, someone might learn from having a divorce or a baby not just that it is important to be attentive to one's partner or that unconditional love is possible, but also how little we know about situations that we haven't experienced ourselves. In other words, that person might gain insights that refer to the BWP criteria of lifespan contextualism, value relativism, and recognition of uncertainty.

Thus, life challenges can foster wise insights but not everybody gains wisdom from them. Especially after a negative experience, many people are not very interested in analyzing what happened they just want to regain their happiness and emotional balance (Weststrate & Glück, 2017a). Only those people who are willing and able to become "experts on life" are likely to explore the meaning

of an experience even if it may be painful for them. The MORE Life Experience Model proposes that certain psychological resources enable people on their way to wisdom to dig deeper into the meaning of life challenges. The most important resources are the following.

*Openness* is a general interest in multiple perspectives. People on the way toward wisdom are interested in how other people's worldviews, goals, and values differ from their own. They have no difficulty with seeking out advice and learning from others, and they are not afraid of new experiences in their own lives.

*Empathic concern*. People developing wisdom are compassionate with others and deeply motivated to alleviate their suffering. People who care deeply about others will strive for achieving a common good rather than for optimizing their own gain in complex situations (Sternberg, 2005). However, wise empathy is not simply taking on others' pain as one's own; it also involves being able to distance oneself so as to help another person optimally.

*Emotional sensitivity and emotion regulation*. People developing wisdom do not only pay attention to the feelings of others. They are also sensitive to their own emotions, and they are skilled at dealing with negative and mixed feelings. They try not to suppress negative feelings but to understand them and learn from them, while at the same time appreciating the positive things in life (König & Glück, 2014). They have learned to manage their emotions as a situation requires, which may sometimes mean recognizing but not showing one's feelings.

*Reflectivity* refers to the idea that people on the way to wisdom are motivated to understand complex issues of human life in their full complexity. Highly reflective people are willing and able to question their own beliefs because learning more about life is more important to them than feeling good about themselves (Weststrate & Glück, 2017a).

*Managing uncertainty and uncontrollability*. Most people tend to overestimate how much control they have over the things that happen in their lives. They believe, for example, that if they eat well and work out, they are never going to fall ill, or that professional success is simply a matter of hard work. People on the way to wisdom have learned from

experience that much in life is uncontrollable—that even people with a healthy lifestyle can have a heart attack, and that good or bad luck plays an important role in people's careers. While they know that something unexpected may happen at any time, however, they are not anxious or overly cautious because they have also learned to trust their own ability to deal with whatever may happen.

According to the MORE Life Experience model, people who have high levels of these five resources will


In this sense, gaining wise insights may not always make people happy. In the short run, it may make people happier to not question their own views, ignore unpleasant or complicated feelings, empathize only with their friends and family, and overestimate their control over their life (Staudinger & Glück, 2011; Weststrate & Glück, 2017b). Wisdom may come at a cost, and the path toward it requires a willingness to face the darker sides of human life.

#### 16.4.3 Wisdom Interventions

As discussed earlier, the current state of our world suggests that we urgently need to find ways to foster wisdom—in individuals as well as in systems and institutions. Research, up to now, has focused on ways to increase individual wisdom. As described in section 3, several studies have shown that short-term interventions can help people access their wisdomrelated knowledge and mindset. These interventions

include imagining discussing a problem with someone else (Staudinger & Baltes, 1996) or imagining that the problem does not concern oneself but someone else (Grossmann & Kross, 2014; Kross & Grossmann, 2012). Another class of interventions consists, of course, of actually discussing a problem with someone else, which has been found to foster wisdom in an experimental setting (Staudinger & Baltes, 1996) as well as in retrospective accounts of real-life experiences (Igarashi, Levenson, & Aldwin, 2018). In this vein, a promising approach to fostering wisdom might lie in simply instructing people to ask for, and listen to, information and advice from others if they are facing a difficult problem. But what characterizes wise advice? It is an interesting and understudied question how wise people give advice to others.

As discussed earlier, in addition to increasing wisdom in individuals, it seems important that we look more into the way situational contexts can foster wisdom (Grossmann, 2017; Surowiecki, 2005). Why, for example, do interactions in online discussion boards often become uncivil and polarized, especially when they are about an ideological or political topic? Perhaps simple interventions, such as having users rate the wisdom of each statement instead of "liking" or "disliking" it, might create an incentive for more balanced and constructive conversations.

In addition to such situational short-term interventions, researchers have discussed how wisdom could be implemented as a goal in more long-term interventions, such as school curricula or psychotherapy. Sternberg (2001; Reznitskaya & Sternberg, 2004) suggested teaching for wisdom in schools, criticizing that today's curricula focus on academic intelligence at the expense of wisdom and ethics. He argued that exercises such as reflecting on and discussing one's own values, possible consequences of decisions, or ethically relevant topics in classes on history or social sciences can have a long-term effect on the development of wisdom. Michael Linden and colleagues, on the other hand, argue that psychotherapy can explicitly focus on elements of wisdom such as perspective-taking or emotion regulation (Linden, Baumann, Lieberei, Lorenz, & Rotter, 2011). In a broader sense, one could argue that many general goals of psychotherapy, such as increased selfGlück Wisdom

individually, we need to learn how to make decisions that are not just smart but wise—decisions that balance our own interests with those of others and

reflection, awareness and regulation of emotions, and empathy are also components of wisdom.

In a world that is facing enormous global challenges, the psychology of wisdom may have important contributions to make. Globally as well as

Summary

1. What is wisdom? There are a number of definitions of wisdom in psychological literature. Wisdom is a complex and multifaceted construct, and different definitions tend to emphasize different aspects of it. The most important components of wisdom are (a) broad and deep life experience and life knowledge, (b) an awareness of the variability and uncertainty of human life and a willingness to consider different perspectives, (c) self-reflection, self-knowledge, and self-acceptance, and (d) compassionate concern for others and a motivation to serve a greater good.

the world at large.


#### Review Questions


# Hot Topics: Wise Solutions for Complex Global Problems

Judith Glück (Photo: Barbara Maier)

What can we do to make today's world wiser? Our world is faced with enormous global challenges including climate change, global inequality, political polarization and rising populism, the negative effects of digitalization, and educational systems that seem to fail at teaching students how to navigate these challenges. What are wise ways to deal with these problems? While earlier wisdom research has focused on wisdom as a characteristic of persons, more recent research is beginning to understand how situations foster or hinder wisdom. To develop wise solutions to complex world problems, however, we need to learn more about the processes of making wise decisions. If, as Robert J. Sternberg (2019) argues, wisdom involves a balancing of different interests that optimizes a common good, how exactly does it achieve this goal? There is a large body of scientific research on judgment and decision-making, but most of these studies have focused on problems that have pre-defined

optimal solutions. New research is needed that identifies wise approaches to solving complex, ill-defined problems. Another open question is how we can create systems that invite or reward wise behavior, e.g., from politicians and policymakers. Recent political developments show that voters are not necessarily attracted by wise political candidates (Sternberg, 2019), so other mechanisms are required to ensure a certain level of wisdom in politics. All democratic countries have constitutional checks and balances that are supposed to protect them against undemocratic developments. However, the recent rise of populism in many Western democracies (Levitsky & Ziblatt, 2018) sheds doubt on the efficacy of these processes in a time of social media and ideological polarization. Wisdom research needs to investigate how political systems can contribute to wise politics, and how people can be made more aware of the importance of wisdom for the survival of our planet.

#### References

Levitsky, S., & Ziblatt, D. (2018). *How democracies die*. New York: Crown Publishing.

Sternberg, R. J. (2019). Why people often prefer wise guys to guys who are wise: An augmented balance theory of the production and reception of wisdom. In R. J. Sternberg & J. Glück (Eds.), *The Cambridge handbook of wisdom* (pp. 162–181). Cambridge: Cambridge University Press. doi:10.1017/9781108568272.009

#### Glück Wisdom

# References


*sonality and Social Psychology*, *115*(6), 1093–1126. doi:10.1037/pspp0000171


*Series B: Psychological Sciences*, *73*, 1350–1358. doi:10.1093/geronb/gby002


bitterment disorder with cognitive behaviour therapy based on wisdom psychology and hedonia strategies. *Psychotherapy and Psychosomatics*, *80*(4), 199–205. doi:10.1159/000321580


*tives* (pp. 191–219). New York: Cambridge University Press. doi:10.1017/CBO9780511610486.009


# Glossary


of them have in common is that they dedicated their lives to a cause that benefited many people and changed the world by peaceful means. 308


# Chapter 17

# Development of Human Thought

#### KATHLEEN M. GALOTTI

#### Carleton College

My then fourteen-year-old daughter wanted to upgrade her cellphone to an expensive smart phone model. She "mentioned" this topic several days a week, for several months. At first, she described all the advantages for her, personally: she'd be able to take more pictures, use Instagram and Snapchat more easily, and text more friends for free. Although numerous, none of these reasons were particularly compelling for me. Eventually, she created a several-slide powerpoint, describing costs and benefits that *did* matter to me—including being able to track where she was, the ability to create a local hotspot for the internet, and chores she promised to do if/when she got the model of phone she was angling for. So persuasive was she that I ended up getting two iPhones—one for each of us (thanks to a two-for-one special).

This ability to plan and marshall a convincing argument illustrates a textbook example of a developing cognitive ability. In earlier points of her development, my daughter could do little more than express her desires (often loudly) or offer one-sided and non-compelling arguments ("I really, really, *really* want it"). Her proclivity to adopt my point of view and use that to offer reasons and incentives that persuaded me to adopt her perspective is a gradually emerging ability, and one that will be the focus of this chapter.

First, we'll talk about different realms of thought, including problem-solving, reasoning, decision making, planning and goal setting. All of these terms

come under the broader term of thinking, and we will explore definitions and connections among these various instances of thought. We will then take a chronological look at how these different realms of thought develop. We will look at some precursors in infancy and the toddler years. We'll have much more to learn about the development of thought in the preschool years, when children become much more verbal. Examination of the elementary school years will show that children gather a lot of information to construct a knowledge base, even as they refine many of their thinking skills. Finally, we'll see dramatic improvements in many if not all realms of thinking when we examine adolescence and young adulthood.

# 17.1 Defining the Domain: Realms of Thought

Let's start by defining a few key terms that we'll be discussing in this chapter. Consider the term, *thinking*. It's a pretty broad term and used to cover a lot of different kinds of mental activities, including making inferences, filling in gaps, searching through mental spaces and lists, and deciding what to do when in doubt. I'll use it in this chapter as the overall label for mental activities that process information.

The terms problem solving, reasoning and decision making are often used interchangeably with the term *thinking*. Many psychologists see the first

three as special cases of the fourth. Specifically, when cognitive psychologists speak of problem solving, they refer to instances where a person is trying to see a solution to some sort of impediment (see Chapter 9, "Problem Solving"). When they speak of reasoning, they mean a specific kind of thinking done to draw inferences, such as you might do in solving certain puzzles or reading a mystery novel (see Chapter 7, "Deductive Reasoning", and Chapter 8, "Inductive Reasoning"). Reasoning often involves the use of certain principles of logic. The term, *decision making*, then, refers to the mental activities that take place when one chooses among alternatives (see Chapter 10, "Decision Making").

Goal setting as used here means a mental activity in which one sets specific intentions to achieve some specific objective or aim. This term is intertwined with planning, which indicates a projection into the future of a trajectory by which goals can be attained, including sourcing the materials and resources needed and taking the steps necessary to achieve an objective.

It is important to note here that thinking tasks we'll talk about make use of two other important cognitive realms: language, and the knowledge base. Language refers to the ways people comprehend and produce utterances (whether in speech or in writing; see Chapter 11, "The Nature of Language", and Chapter 12, "Language and Thought"). Being a proficient language user certainly helps when it comes to understanding and expressing one's arguments, decisions, or plans.

The knowledge base refers to the sum total of stored information that an individual possesses (see Chapter 4, "Concepts: Structure and Acquisition", and Chapter 5, "Knowledge Representation and Acquisition"). For example, I know hundreds of thousands of words; I have previously memorized multiplication tables up to 12 and can quickly retrieve from memory many multiplication facts; I remember names of teachers and classmates from my kindergarten year up through graduate school; I also know about parenting, dog training techniques, mystery stories, *Pokemon Go* and some television series (currently I'm binge-watching *Scandal*). When people think, they think *about* things, and the richer

their knowledge base, the richer their thinking about propositions derived from it.

With those introductory remarks in mind, let's turn to a chronological look at the development of thinking in infancy through adolescence.

# 17.2 Infancy and Toddlerhood

It might seem a little incongruous to have a section on thought in infancy. After all, one of the great cognitive developmental theorists, Jean Piaget, argued that infants were at a stage of development where, essentially, they did not have thought (Piaget, 1952). Piaget believed that individuals passed through a series of stages in their cognitive development, with each stage defined by a qualitatively different set of intellectual structures through which the individual processed information and understood the world. The first stage of cognitive development, which operates from birth to roughly 2 years, was named the *sensorimotor* stage by Piaget, because his belief was that infants and toddlers were limited in their cognition to sensory experiences and motor responses. Put another way, from birth through the first 18 to 24 months, infants and toddlers were said to lack a capacity for mental representation, the ability to construct internal depictions of information.

One of Piaget's most famous demonstrations of (the lack of) infant cognition is on the so-called "object permanence" task, depicted in Figure 17.1. A young (say, five- or six-month-old) infant is seated facing a desirable object or toy. Suddenly, some sort of screen is placed between the infant and the object. Typically, the infant fairly immediately appears to lose all interest, as if the object or toy has somehow ceased to exist! Piaget's explanation is that objects out of sensorimotor contact are truly "out of mind", because the infant has no capacity for mental representation.

Because he believed that infants lack that capacity, Piaget would conclude that infants really don't do very much, if any, "thinking." However, some recent work has challenged Piagetian interpretations of infant cognition, and reawakened the idea that infants do have some knowledge and some rudimen-

Figure 17.1: According to Piaget, until object permanence develops, babies fail to understand that objects still exist when no longer in view. Source: Galotti (2017, p.113).

tary mental activity that can be clearly labelled as "thinking."

One of the most prolific researchers posing this challenge to Piaget is psychologist Renée Baillargeon. Here, we will only cover a small fraction of her elaborate body of work. In one classic study (Baillargeon, 1986), she seated infants (6-8 months old) in front of a screen set up to the right of an inclined ramp. During the first phase of the study, infants saw the screen raised and lowered. Behind the screen was a track for a small toy car. After the screen was lowered, infants saw a small toy car go down the inclined ramp and to the right, behind the screen.

Next, infants were given the impossible/possible events task, in which they were tested with one of two events—the first, a "possible" event, occurred when the screen was raised. It revealed a box sitting behind the track. As in the first phase of the study, after the screen was lowered, the car rolled down the ramp and across the track behind the screen. The second, "impossible" event was very similar to the possible event, except that the box was actually placed *on* the track instead of behind it.

Now, according to Piaget, 6-month-old infants ought not to react any differently to the "possible" than to the "impossible" event. Lacking a sense of object permanence, they should be just as unsurprised to see a car roll in front of a box as "through" a box—after all, if infants have no expectations of objects continuing to exist when hidden behind a screen, then they would have forgotten all about the

existence of the occluded box anyway. But Baillargeon's results showed something clearly at odds with Piagetian predictions. Her 6.5- and 8-monthold participants, and even some 4-month-old female participants, looked longer at the "on-track" "impossible" event. Baillargeon interpreted this result to mean that the infants "(a) believe that the box continued to exist, in its same location, after the screen was lowered; (b) believed that the car continued to exist, and pursued its trajectory, when behind the screen; (c) realized that the car could not roll through the space occupied by the box; and hence (d) were surprised to see the car roll past the screen when the box lay in its path" (Baillargeon, 1999, p. 128).

In a related study, Baillargeon and DeVos (1991) presented infants 3.5 months old with an unusual stimulus display. Each infant saw one of two events first. These events presented either a short carrot or a tall carrot moving behind a large rectangular yellow screen, followed, a few seconds later, by the emergence of an identically appearing carrot appearing from the right-hand side of the screen. In other words, it looked as though the same carrot simply traveled behind the occluding screen. After a 1-second pause, the experimenter slid the carrot back behind the yellow occluding screen, paused for 2 seconds, and then slid the leftmost carrot out from behind the left edge of the screen. This cycle of carrots disappearing and reappearing continued until the infant reached a predetermined criterion of amount of time looking at the stimulus or looking away having previously attended to it.

Next came either a "possible" or "impossible" event. This event was the same as the corresponding habituation event, *except that* the occluding screen had a new color, blue, meant to draw infants' attention to the fact that the screen was new. It also had a new shape: a large rectangle with a smaller rectangle "cut out" from the top. The idea was that short carrots ought to fit completely behind the new screen all the way across, and thus the possible event ought not to have been perceived as all that surprising. However, a tall carrot would *not* have fit behind the new screen—its top ought to have been visible as it moved through the "cut out" portion of the screen if it were moving from one end to the other. Thus, the tall carrot moving behind the new screen ought to have been an impossible event.

Results showed that although infants looked for an equal amount of time at the two habituation events (i.e., tall vs. short carrots moving behind the rectangular yellow screen), they looked longer at the impossible than the possible test event. Baillargeon and DeVos (1991) took this result as evidence that their three-and-a-half-month-old infants "(a) realized that each carrot continued to exist after it slid behind the screen, (b) assumed that each carrot retained its height behind the screen, (c) believed that each carrot pursued its trajectory behind the screen, and therefore, (d) expected the tall carrot to be visible in the screen window [the opening in the blue test screen] and were surprised that it was not" (p. 1233).

These conclusions (and others from Baillargeon's additional studies not described here) strongly suggest that even fairly young infants possess a fair amount of knowledge about what objects are and how they behave. Baillargeon (2008) believes that infants begin with an innate principle of persistence, "which states that objects persist, as they are, in time and space" (p. 11). From this initial knowledge, infants gather perceptual information and use it to construct more complex and detailed representations of objects and, in so doing, learn more about how objects behave and what their properties are. So, if you believe Baillargeon's interpretations (and not everyone does; see Cohen & Cashon 2013 for a critique), young infants *do* have some knowledge about objects. What about knowledge about social beings?

In a recent review, Baillargeon, Scott, and Bian (2016) present evidence from many different studies from many different laboratories that young infants and toddlers can reason about agents' goals and states and can use this information to predict an agent's future actions. Here's just one example (from Woodward, 2009): an infant sees an adult seated at a table with two different toys (let's call them A and B) in front of her. She reaches for and grasps one of the toys (A). Infants watch repetitions of this action for some predetermined amount of time, becoming habituated to seeing this action. Next, they see the same adult in front of the same two toys, which have now traded positions. Infants as young as five months look longer when the adult reaches for the new toy (B) than they do when the adult reaches for (A). According to Baillargeon et al (2016) these infants: "(a) attributed to the agent a preference or liking for object A, as the agent always chose it over object B, and (b) expected the agent to continue acting on this preference. . ." (p. 162). This finding has been replicated in several laboratories.

Baillargeon along with other developmental psychologists such as Elizabeth Spelke and Susan Carey argue that infants are born with some amount of "core knowledge." The existence of these innate systems does not imply that infants can articulate all their principles. Indeed, infants aren't known for their articulation abilities in any domain. Instead, the implication here is that infants come into the world prepared to make certain assumptions, entertain certain hypotheses, or hold certain expectations of the way objects will or won't behave. Thus, they do have some knowledge, and thus, they can do some rudimentary reasoning about it.

### 17.3 The Preschool Period

It is in the preschool period that we see the first glimmers of what cognitive psychologists call "higher order cognitive processes"—processes that operate on mental representations. These glimmers are fleeting and fragile, but also unmistakable signs of growing maturity of thought.

One of my personal favorite demonstrations of preschooler reasoning competence comes from the work of Hawkins, Pea, Glick, and Scribner (1984). They demonstrated that, under certain circumstances at least, preschoolers aged 4 and 5 years could draw deductive inferences (see Chapter 7, "Deductive Reasoning"). They began by constructing various reasoning problems, examples of which are shown in Table 17.1. There were three types of problems. The first consisted of premises that were congruent with the child's world knowledge—for example, "Bears have big teeth. Animals with big teeth can't read books. Can bears read books?" Note that whether a child actually reasoned from the premises or from her world knowledge of the general illiteracy of bears, she would have arrived at the deductively correct conclusion, "No." Preschoolers were expected to do particularly well on these problems, even if their scores overstated their true reasoning ability.

A second type of problem included information that was incongruent with the child's world knowledge—for example, "Glasses bounce when they fall. Everything that bounces is made of rubber. Are glasses made of rubber?" Here, the real-world correct answer is directly at odds with the answer a reasoner would derive from strictly reasoning from the premises to answer the question. Preschoolers were expected to do particularly poorly on these problems, as it was expected they would answer the questions using their world knowledge rather than use abstract reasoning to derive a valid conclusion.

The most theoretically interesting type of problem was one using so-called "fantasy" premises—for example, "Every banga is purple. Purple animals always sneeze at people. Do bangas sneeze at people?" Notice that in these problems, there is no relevant world knowledge for the child to call upon. Hawkins et al. (1984) believed, then, that fantasy problems would be the ones most likely to reveal whether or not preschool children could, in fact, draw logical inferences.

The results were clear-cut. Children were presented with 8 problems of each kind. Overall, children gave correct responses to 7.5, 1.0, and 5.8 congruent, incongruent, and fantasy problems, respectively. A chance level of performance was 4, and thus children performed significantly better than chance on the fantasy (and congruent) problems. Thus, the authors concluded, preschool children, under limited circumstances, *can* reason deductively.

Moreover, the order in which the problems were administered was crucially important. Children who reasoned with fantasy premises first tended to perform better on all problems, even the congruent and incongruent ones, than did the children who received congruent problems first, incongruent problems first, or problems in a jumbled order. Hawkins et al. (1984) argued that presenting fantasy problems first sets a context for children to help cue them as to how to correctly solve the problem. When congruent or incongruent problems were presented first, children mistakenly recruited their real-world knowledge to answer the questions, instead of relying strictly on the premises.

Of course, being able to draw a deductive inference in certain circumstances does not prove that preschoolers are fully capable of deductive reasoning. Adults can reason better than preschoolers on just about every problem, but do especially well with incongruent content. Indeed, Markovits and Barrouillet (2004) argue that what happens with cognitive development is increasing control over complex forms of reasoning, and being able to divorce one's store of knowledge about the world from the information presented in the premises to a problem.

Another important development in children's thinking in the preschool period concerns the development of theory of mind. A person's theory of mind is the ability to reason about mental states (Apperly, 2012). Thus, theory of mind guides a person's beliefs and expectations about what another person is thinking, feeling, or expecting; it guides one's ability to predict accurately what another person's reaction will be to a specific set of circumstances (Flavell, Green, & Flavell, 1995). This ability develops rapidly between the ages of two and five.

One common task used to investigate preschool children's theory of mind is the so-called *false belief task* (Wimmer & Perner, 1983). For example, children might be told a story about a boy who puts a toy in a box and leaves the room. While he is away, his sister enters the room, takes the toy out of the box, plays with it, and puts it away in a different location. Children are then asked where the *boy* (who was not present in the room at the time the toy was

Model Affirmative Example Negative Example A is B Every banga is purple. Bears have big teeth. B is C Purple animals always sneeze at people. Animals with big teeth can't read books. A is C Do bangas sneeze at people? Can bears read books? A has B Pogs wear blue boots. Rabbits never bite. C is an A Tom is a pog. Cuddly is a rabbit. C has B Does Tom wear blue boots? Does Cudly bite? A does B when ... Glasses bounce when they fall. Merds laugh when they're happy. B is C Everything that bounces is made of rubber. Animals that laugh don't like mushrooms. A has C Are glasses made of rubber? Do merds like mushrooms?

Table 17.1: Types of problems used by Hawkins, Pea, Glick, and Scribner (1984). Source: Galotti (2017, p. 230), adapted from Hawkins, Pea, Glick, and Scribner (1984, p. 585).

moved) will think the toy is. In other words, can the children disentangle their own state of knowledge about the toy from the state of knowledge or belief of someone who lacks their information?

Another theory of mind task is the *unexpected contents task* (e. g., Gopnik & Astington, 1988), in which a child is handed a box of, say, crayons but opens it to discover that the box really contains small candies. The child is then asked to predict what another child, who has no previous experience with the crayon box, will think is inside. Typically, children younger than about 4 years answer that they knew all along that the box contained candies rather than crayons, even though they initially answered "crayons" when asked what was in the box. Further, young preschoolers respond that someone else coming into the room later will think that the crayon box contains candies rather than crayons.

Apperly (2012) makes the argument that although theory of mind is studied widely in preschoolers, it's a mistake to believe that only preschoolers struggle with this concept. Infants, as we've just seen, have some (if incomplete) knowledge about others' goals; adults show stable individual differences in their ability to predict others' motivations and intentions. Thus, theory of mind is not something that a child "finishes" developing at age 5. However, most researchers agree that there is rapid development in theory of mind during the preschool period, and it seems to correlate with developments in language, pretend play, symbolic understanding, and inhibitory control, the ability to maintain focus and resist the temptation to become distracted (Carlson, Moses & Claxton, 2004; Lillard & Kavanaugh, 2014; Wellman, Cross, & Watson, 2001).

# 17.4 Middle Childhood

One of the more noticeable aspects of cognitive development in middle childhood is the growth of the knowledge base (see Chapter 5, "Knowledge Representation and Acquisition" ). School-aged children in the United States learn an incredible amount of what adults would consider "basic" information vocabulary words; how to read; how to use different punctuation marks; addition, subtraction, multiplication, and division facts; historical and geographical facts; information about certain authors; and information about animals, planets, and machines, to take just a few examples from my children's elementary

school's curriculum. Add to that knowledge of domains that aren't formally taught in schools—how to play *Minecraft*, how to operate an iPhone, or characters from the *Magic Tree House* or *Harry Potter* book series are just a few examples.

With this tremendous acquisition of knowledge going on, children need to find efficient ways of storing and representing it. (As an analogy, think about files on your laptop. It didn't matter very much what you called them when you only had a small number, but when you get up into the thousands of files, how you organize them might well determine whether or not you are ever going to find a particular one again.) How children represent and organize their knowledge is certainly a matter of active debate and discussion in the field. Presumably, their knowledge bases underlie their ability to draw inferences from examples they see. Like so many other topics in this chapter, we'll only have space to cover a couple of examples.

Kalish, Kim, and Young (2012) reported on three studies of preschoolers and young school-aged children that we will focus on. The task presented children with a number of individual examples of a category, e.g., small plastic frogs or dinosaurs that were either yellow or blue. Typically, children would first see a *biconditional relation* between color and species. For example, they might be shown four yellow dinosaurs and four blue frogs, one at a time. What makes this relationship *biconditional* is that all yellow things are dinosaurs, and all dinosaurs are yellow.

In a second phase of the task, children were presented (again, one at a time) with examples some of which undermined the biconditional relationship. For example, children might see six yellow frogs and two yellow dinosaurs. So, after this information is presented, it is no longer true that all yellow things are dinosaurs, nor that all frogs are blue. However, there are *conditional* relationships that remain true even after this phase of the task. For example, the relationship, *If an item is a dinosaur, it is yellow* remains true, although it allows for the possibility of other yellow things, for example, frogs, existing.

Older (seven-year-old) children were able to see that some conditional relationships (*if dinosaur, then yellow*) were true after the second phase of the task even though the biconditional relationship (*all and only yellow things are dinosaurs*) were not. That is, they were able to revise their beliefs about what relationships held in light of new evidence. The ability to make this revision seemed, in contrast, to escape the five-year-olds.

These results echo ones reported earlier by Deanna Kuhn (1977) who presented children aged 6-14 with conditional reasoning problems all pertaining to the fictional land of Tundor. She began with a pretest disguised as a game where she would give them one piece of information about Tundor (e.g., "John is tall, and Bob is short") and then ask questions (e.g., "Is Bob tall?") to which the child could respond "yes", "no", or "maybe." The pretest gave examples of questions that could be answered definitively as well as ones that could not, based on the given information. Only children who correctly answered both pretest questions were allowed to continue.

Next, Kuhn (1977) gave children conditional reasoning problems. For example, "All of the people in Tundor are happy. Jean lives in Tundor. Is Jean happy?" (The correct, logically valid answer is *yes*, and this is considered a fairly easy inference to draw) or, "All people who live in Tundor own cats. Mike does not live in Tundor. Does he own a cat?" (Here, the correct answer is *maybe*; no logically necessary inference can be drawn; though even adults make mistakes on this type of problem). Kuhn found that even the first graders show some reasoning ability, particularly on easy problems. Children did less well on the more difficult problems (the ones adults make mistakes on), unsurprisingly. In similar studies, Janveau-Brennan and Markovits (1999) conclude that children are likely reasoning in ways fundamentally similar to the way adults reason, at least by the time they are in middle childhood, and when they are reasoning with concrete kinds of content rather than abstract propositions.

#### 17.5 Adolescence

Cognitive developmental psychologists have long noticed another major change in thinking that occurs right around puberty. Adolescents are much

more capable than younger children of thinking hypothetically, and about the future; and to be able to think abstractly versus only with concrete instances as they were in childhood (Byrnes, 2003; Galotti, 2017). A now-classic study by Daniel Osherson and Ellen Markman (1975) illustrates this last point very well.

Children, adolescents, and adults were shown small plastic poker chips in assorted solid colors, and were told that the experimenter would be saying some things about the chips and that they should indicate after each statement if it was true, if it was false, or if they "couldn't tell." Some of the statements were made about chips held visibly in the experimenter's open hand. Other, similar statements were made about chips hidden in the experimenter's closed hand. Among the statements used were logical tautologies (statements true by definition)—for example, "Either the chip in my hand is yellow, or it is not yellow"; logical contradictions (statements false by definition)—for example, "The chip in my hand is white, and it is not white"; and statements that were neither true nor false by definition but depended on the color of the chip (e.g., "The chip in my hand is not blue and it is green").

Younger children (those in grades 1, 2, 3 and even 6) had difficulty distinguishing between statements that were empirically true or false (i.e., true in fact) and those that were logically true or false (i.e., true by necessity or definition). They did not respond correctly to tautologies and contradictions, especially in the hidden condition. They tended to believe, for example, that a statement such as "Either the chip in my hand is red, or it is not red" cannot be assessed unless the chip is visible. Tenth graders and adults, in contrast, were much more likely to respond that even when the chip couldn't be seen, if the statement was a tautology or contradiction, the statement about it could be evaluated on the basis of the syntactic form of the sentence. Said another way, adolescents and adults are able to examine the *logical!form* of a statement, instead of insisting that none of the "hidden" statements could be evaluated.

Thinking about the future is also an important emerging capability in adolescence (Nurmi, 1991). Being able to project oneself into a future context requires an ability to think beyond the current set of

circumstances. For example, most sixteen-year-olds in the United States are high school students who live with parents or guardians. But as they prepare for adult life, they have to be able to imagine what it will be like to live independently, find and keep a job, decided on whether and what kind of further education they will seek, among other life-framing decisions.

This kind of thinking is crucial to what cognitive developmental theorists call identity development. This term refers to the development of a mature sense of who you are and what your goals, values, and principles are. Lifespan developmental psychologist Erik Erikson (1968) was the first to highlight the construction or discovery of identity as a major developmental task, typically first encountered during adolescence. Psychologist James Marcia (1966), however, is the one credited with operationalizing this idea and developing measures to study it.

Marcia (1966) saw identity development as proceeding through two or more phases, and these are depicted in Figure 17.2. Marcia asserted that identity status is defined jointly by two factors: whether or not the person had made a definite choice or commitment (e.g., to a career, to a value system, to a romantic partner) and whether or not the person had gone through some sort of "crisis", or period of active doubt and exploration, in making that choice.

A teen in the *identity diffused* status has not made any commitments and has not developed a relevant set of values or principles with which to guide his goal setting and decision making in a given realm (e.g., career, education, political philosophy, religious affiliation). He has not experienced a period of crisis or doubt but, rather, either is in the early phase of identity development or is simply drifting along, with no set plan for the future.

An adolescent in the *foreclosure* status, in contrast, is very committed to a plan and/or to a set of values and principles. Similar to her identity diffused colleagues, however, she has never experienced a crisis or period of doubt. Typically, this indicates that she has adopted someone else's goals and plans, most often those of a parent or another significant adult figure. Thus, adolescents in this status tend to have a very narrow vision for their future—and not much autonomy or power in mak-

Figure 17.2: Marcias Identity Statuses. Source: Galotti (2017, p. 399).

ing decisions. Students at my college who enter my office on their very first day of college, announcing they are "premed" or "prelaw" because both of their parents are doctors or lawyers and they've known since they were 5 what they'd be, tend to present rather textbook examples of foreclosure.

The *moratorium* identity status is often typified by college students who "want to keep all their options open." They are actively exploring different options, experimenting and trying on for size the possibility of different majors, different careers, and different religious or political affiliations. The moratorium student is usually struggling and not in what others would call a "stable" state—that is, this individual is likely to remain in this period for only a brief period—a year or two (Moshman, 2011). Moratorium is a period of delay, in which the individual knows that a commitment must soon be made but is not yet ready to make it. Individuals in this status usually either resolve this crisis in a positive way, moving into the identity achieved status, or, in less successful cases, retreat into identity diffusion.

Marcia (1966) held that only individuals who experienced moratorium could move into the *identity achieved* status. The individual here has made one or more personal commitments, after having struggled to find her or his own path toward that decision. This student has considered alternative options and weighed both the pros and the cons. This status is seen as marking a successful end to adolescent development, as a bridge has been built from one's childhood to one's future adulthood. Accompanying identity achievement are increases in self-acceptance.

Many theorists find Marcia's (1966) proposal a useful analogy for understanding one major realm of adolescent development (Moshman, 2011). Identity encompasses an adolescent's value system as well as her view of knowledge and herself as a learner and an agent in the world.

# 17.6 Conclusion

Our look at the development of thought has been brief and selective. I've tried to give you some flavor of the changes occurring during the first two decades of life when it comes to higher-order cognitive processes. We have seen a gradual increase in knowledge of the world—the inputs used in thinking and reasoning and decision making. Although infants are not without relevant knowledge of the world, certainly they have much less when compared with a child in third grade or an adolescent. We've also seen that thinking becomes more abstract, more

flexible, and sometimes even more hypothetical with increasing levels of cognitive development.

Many questions remain to be resolved. How many of the changes we've described are due to factors such as biological maturation, say, versus education, experience, and expertise? Are there periods of rapid change in thinking, or is the entire process an orderly and continuous one? How different are the trajectories of thinking for children who grow up in very different cultures? Are the developmental paths for thinking general-purpose and broad, or does thinking develop differently in different domains? Stay tuned to the field of cognitive development to find the answers to these important questions!

# Summary


#### Review Questions


#### Hot Topic

Kathleen Galotti (Photo: Tania Legvold)

My research program is centered around the question, how do ordinary people facing important decisions go about the process of choosing an option? I've studied adults choosing a first-grade program for their children; pregnant women choosing birthing options; college students choosing majors, courses, housing, and summer plans, to name just a few. Here, I'll focus on the studies of college students choosing a major (Galotti, 1999; Galotti, Ciner, Altenbaumer, Geerts, Rupp, & Woulfe, 2006; Galotti, Wiener & Tandler 2014).

Many of these studies were *longitudinal* in design—meaning that we asked the same people about their decision-making process at two or more different points in time, in order to study changes over time. At each point, we asked students to describe the *options* they were actively considering (e.g., Psychology, Computer Science, English) as well as the *criteria* they were

using to decide among options (e.g., How many requirements are there? Do I like the profs who teach the classes? Are there labs? Will it help get me into med school?). We also asked students to assess their emotional reactions to the decision-making process (e.g., How stressful was it? How comfortable with the process were they?).

Across studies, college students considered about 4-5 options and about 5-7 criteria. As the final decision drew near, students were likely to reduce the number of options under consideration (from about 4-5 to about 3-4), but not the number of criteria they were using. When we looked at whether or not the same options or criteria were being used at different points in time, the answer was that about half of the options and half of the criteria were different. Students generally reported that this decision was moderately stressful and difficult, and that it was guided by their overall values, with an emphasis on the future. Some work suggests, however, that the way students approach a specific decision is largely a function of what that decision is about. The implication here is that people approach different decisions has at least as much to do with the specifics of a particular decision as it does with the characteristics of the decision maker.

#### References


# References


*of Experimental Child Psychology, 8*7, 299–319. doi:10.1016/j.jecp.2004.01.002


#### Galotti Glossary

# Glossary


# Chapter 18

# Affect and Thought: The Relationship Between Feeling and Thinking

JOSEPH FORGAS

University of New South Wales

Since time immemorial, philosophers, writers, and artists have wondered about the intricate relationship between feeling and thinking, affect and cognition. Humans are certainly an emotional species. Our feelings seem to influence and color everything we think and do (Zajonc, 2000), in ways that we do not yet fully understand. Philosophers such as Blaise Pascal put it very succinctly: 'The heart has its reasons that reason does not understand'. Yet apart from some early exceptions (e.g., Rapaport, 1942/1961; Razran, 1940), focused empirical research on the links between affect and cognition has been slow to emerge. One possible reason is the widespread assumption in Western philosophy that affect is an inferior and more primitive faculty of human beings compared to rational thinking, an idea that can be traced all the way to Plato (Adolphs & Damasio, 2001; Hilgard, 1980; see also Chapter 2, "History of the Field of the Psychology of Human Thought"). Affective states indeed have some unique properties. They often have broad non-specific effects on thinking and behavior, can occur spontaneously and often subliminally, they are difficult to control, and they are linked to powerful and sometimes visible bodily reactions. Most importantly, affective states have an invasive quality, influencing our thoughts and behaviors (Dolan, 2002; James, 1890).

Yet, of the two major paradigms that dominated the brief history of our discipline (behavior-

ism and cognitivism), neither assigned great importance to the study of the functions of affective states, such as moods and emotions. Radical behaviorists considered all unobservable mental events (including affect) as irrelevant to scientific psychology. The emerging cognitive paradigm in the 1960s largely focused on the study of cold and rational mental processes, and initially also had little interest in the study of affect. Thus, understanding the delicate interplay between feeling and thinking still remains one of the greatest puzzles about human nature (Koestler, 1967/1990). It was only in the last few decades that researchers started to focus on how moods and emotions influence how people think and behave.

This chapter reviews what we now know about the multiple roles that affective states play in influencing both the *content* (what we think) and the *process* (how we think) of cognition. After a brief introduction looking at some early work and theories linking affect and cognition, the chapter is divided into two main sections. First, research on affective influences on the *content* of thinking is reviewed, focusing especially on how positive and negative affective states preferentially produce positive and negative thoughts, a pattern of thinking called affect congruence. The second section of the chapter surveys evidence for the *processing effects* of affect, documenting how affect influences the quality of our information processing strategies.

For the purposes of our discussion, affect is used as a generic term to encompass two distinct kinds of feeling states. Moods may be defined as "relatively low-intensity, diffuse, subconscious, and enduring affective states that have no salient antecedent cause and therefore little cognitive content" (Forgas, 2006, pp. 6–7). Distinct emotions in contrast are more intense, conscious, and short-lived affective experiences (e.g., fear, anger, or disgust). Moods tend to have relatively uniform and reliable cognitive consequences, and much of the research we deal with looks at the cognitive consequences of moods. Emotions such as anger, fear, or disgust tend to have more context and situation-dependent effects that are less uniform (e.g., Unkelbach, Forgas, & Denson, 2008).

# Early Evidence Linking Affect and Cognition

Although radical behaviorists showed little interest in affect, Watson's classic conditioning research with Little Albert is an early demonstration of affect congruence in judgments—when negative affect produces negative reactions (Watson & Rayner, 1920). These studies showed that reactions to an initially neutral stimulus, such as a furry rabbit, became more negative after participants experienced unexpected negative affect, elicited by a sudden loud noise. Watson—incorrectly, as it turns out—thought that most complex affective reactions are acquired in a similar way throughout life as a result of ever-more complex and subtle layers of stimulus associations. In a later study linking affect and thought, Razran (1940) found that people responded to sociopolitical messages more favorably when they were in a positive affective state (just received a free lunch!) rather than in a bad affective state (being exposed to aversive smells). Politicians seem to instinctively know this, using positive affect manipulations (upbeat music, free food and drinks, etc.) to improve the likely acceptance of their messages.

In a subsequent psychoanalytically oriented study, Feshbach and Singer (1957) induced negative affect using electric shocks and then instructed subjects

to suppress their fear. Fear produced more negative evaluations of another person just encountered, and ironically, this effect became even greater when judges were actively trying to *suppress* their fear. This paradox pattern was interpreted as consistent with the psychodynamic mechanism of suppression and projection, suggesting that "suppression of fear facilitates the tendency to project fear onto another social object" (Feshbach & Singer, 1957, p.286).

Subsequently, Byrne and Clore (1970) returned to a classical-conditioning approach to explore how affective states can color thinking and judgments. They placed participants into pleasant or unpleasant environments (the unconditioned stimuli) to elicit good or bad moods (the unconditioned response), and then assessed their evaluations of a person they just met (the conditioned stimulus; Gouaux, 1971; Griffitt, 1970). As expected, manipulated positive affect reliably produced more favorable judgments than did negative affect. These early studies, although based on very different theoretical models (psychoanalysis, behaviorism, etc.), produced convergent evidence demonstrating an affect congruent bias in thinking.

# 18.1 Affect Congruence: Affective Influences on the Content of Thinking

In the studies described above, positive affect produced more positive thoughts and negative affect produced more negative thoughts. Interest in this pattern of affect congruence re-emerged in the last few decades. Investigators now wanted to understand the information-processing mechanisms that can explain how affect can come to infuse the *content and valence* (positivity or negativity) of cognition. Three convergent theories accounting for affect congruence have been proposed: (1) associative network theories emphasizing underlying memory processes (Bower, 1981; 1991), (2) affectas-information theory relying on inferential processes (Clore & Storbeck, 2006; Schwarz & Clore, 1983), and (3) an integrative Affect Infusion Model (AIM; Forgas, 1995, 2006), a theory that seeks to explain how different thinking strategies can increase or decrease the extent of affect infusion.

# 18.1.1 A Memory Effect? The Associative Network Explanation

The first cognitive model to explain affect congruence suggested that affective states influence cognition because affect is linked to memory within a shared associative network of memory representations (Bower, 1981). When an affective state is experienced, for whatever reason, that affect may automatically prime or activate units of knowledge or memories previously associated with the same affective state. Such affectively primed constructs are then more likely to be primed or activated, and used in subsequent constructive cognitive tasks. For example, Bower (1981) found that happy or sad people were more likely to remember details from their childhood and also remembered more events that occurred in the past few weeks that happened to match their current affective state. Similar affect congruence was also demonstrated in how people interpreted their own and others' observed social behaviors. When happy or sad participants viewed the same videotape of an encounter, judges in a positive affective state saw significantly more skilled, positive behaviors both in themselves and in other people, while those in a negative mood interpreted the same observed behaviors more negatively (Forgas, Bower, & Krantz, 1984).

Further research showed that affect congruence is subject to some limiting conditions (see Blaney, 1986; Bower & Mayer, 1989). Affect-congruence seems most robust (a) when the affective state is clear, strong, and meaningful, (b) the cognitive task is self-referential, and (c) when more open, elaborate, and constructive thinking is used (Blaney, 1986; Bower, 1991; Bower & Mayer, 1989). In general, quick, easy, familiar and regularly performed tasks are less likely to show affect congruence. In contrast, cognitive tasks that call for more constructive, openended thinking (such as judgments, associations, inferences, impression formation, and planning behaviors) are most likely to show an affect-congruent

pattern (e.g., Bower, 1991; Fiedler, 2002; Forgas, 1995; Mayer, Gaschke, Braverman, & Evans, 1992). This occurs because more open, elaborate processing increases the opportunities for affectively primed memories and associations to be retrieved and incorporated into a newly constructed response (Forgas, 1995; 2006).

# 18.1.2 Affect as a Heuristic? The Affect-As-Information Theory

Following Bower's (1981) work, an alternative theory sought to explain affect congruence by proposing that instead of computing a judgment on the basis of recalled features of a target, individuals may "ask themselves: 'how do I feel about it?' [and] in doing so, they may mistake feelings due to a preexisting state as a reaction to the target" (Schwarz, 1990, p. 529; see also Schwarz & Clore, 1983; Clore & Storbeck, 2006). In other words, rather than properly constructing a response, the pre-existing affective state is used as a heuristic shortcut indicating their reaction to a target. For example, affect incidentally induced by good or bad weather was found to influence evaluative judgments on a variety of unexpected and unfamiliar questions in a telephone interview (Schwarz & Clore, 1983). In a similar situation, we also found affect congruence in survey responses of almost 1000 subjects who completed a questionnaire after they had just seen funny or sad films at the cinema (Forgas, 1995).

The affect-as-information model is closely based on related research showing that people often rely on various shortcuts in their judgments. The model is also related to earlier conditioning models that predicted a blind, unconscious connection between affect and coincidental responses (Byrne & Clore, 1970). This kind of affective influence is far less likely to explain affective influences on more complex cognitive tasks, involving memory and associations where more elaborate computation is required. Affect as a simple, direct source of evaluation seems most likely when "the task is of little personal relevance, when little other information is available, when problems are too complex to be solved systematically, and when time or attentional resources are limited" (Fiedler, 2001, p. 175), as in the casual survey situations studied by Schwarz and Clore (1983), and also in the study showing affective influences on responses to a street interview after seeing happy or sad movies by Forgas (1995). In most realistic situations when people need to think constructively about new, unfamiliar and complex problems, moodcongruent associations in memory offer a more plausible explanation for affect congruence than simply using affect as a heuristic cue.

# 18.1.3 Putting it all Together: The Affect Infusion Model (AIM)

The research reviewed so far suggests that the occurrence of affect congruence in thinking (more positive thoughts in positive mood, more negative thoughts in negative mood) very much depends on *how* a particular cognitive task is processed. The Affect Infusion Model (AIM; Forgas, 1995; 2006) relies on this principle to explain the presence or absence of affect congruence in different situations. The AIM identifies four alternative processing strategies that vary in terms of (a) their *openness* (how much new information needs to be accessed), and (b) the degree of *effort* used in processing a cognitive task. The first, *direct-access* strategy involves the simple and direct retrieval of a pre-existing response, likely to be used when a task is familiar and of low relevance, producing no affect infusion (for example, if somebody asked your opinion about a familiar target, like President Trump, and you already have a well-defined and stored judgment, simply reproducing this judgment requires no constructive thinking and will not be influenced by how you are feeling at the time). (2) *Motivated processing* occurs when thinking is dominated by a specific motivational objective requiring highly targeted and selective information search and processing strategies that inhibit open, constructive thinking (e.g., when trying hard to make a good impression at a job interview, this objective will dominate your responses, and your affective state will not have much of an affect congruent influence) (Clark & Isen, 1982; Sedikides, 1994).

(3) The third, *heuristic processing strategy* (using whatever easy shortcuts are available) involves low-effort processing used when time, involvement

and processing resources are limited (e.g., in the telephone and street survey situations studied by Schwarz & Clore, 1983, and in Forgas, 1995). Heuristic processing only results in affect congruence when affect can be used as a convenient shortcut to infer a reaction (Schwarz & Clore, 1983; see also Clore & Storbeck, 2006). (4) Only the fourth processing style, *substantive processing*, involves constructive and effortful thinking. This kind of thinking should be used when the task is new and relevant and there are adequate processing resources available (for example, trying to form a judgment about a new person you are likely to see a lot of in the future). Substantive processing should produce affect congruence because it increases the likelihood of incorporating affectively primed thoughts and memories in constructing a response (Forgas, 1994; 1999). In summary, the AIM explains how four different processing strategies may promote or inhibit affect congruence in thinking and judgments (Fiedler, 2001; Forgas, 1995). One interesting and counter-intuitive prediction of this model is that sometimes, more extensive and elaborate thinking may actually increase affective distortions in judgments by increasing the likelihood that affectively primed information will be used (Forgas, 1992; Sedikides, 1995). Such a paradoxical pattern has now been found in a range of studies, as we will see below.

# 18.1.4 Affect Congruence in Memory

Affect plays a key role in memory. The events we remember are almost always marked out for special attention by their affective quality (Dolan, 2002). And by definition, only the things we actually remember—the available contents of memory can be used for thinking. Considerable research now shows that affect indeed does have a significant influence on what we remember. People are consistently better at remembering memories that are either consistent with their current affective state (affect congruence), or have been experienced in a similar, matching rather than dissimilar affective state (affect-state dependent memory).

Several studies found that people are better at retrieving both early and recent autobiographical memories that match their current mood (Bower, 1981; Miranda & Kihlstrom, 2005). Depressed people also selectively remember negative experiences and negative information (Direnfeld & Roberts, 2006). This pattern is also confirmed with implicit tests of memory when happy or sad subjects are asked to complete a few letters to make a word that first comes to mind (e.g., *can*- may be completed into words like *cancer* or *candy*; Ruiz-Caballero & Gonzalez, 1994). It turns out that happy people reliably come up with more positive, and sad people with more negative words in such a task. We found that happy or sad participants also selectively remembered more positive and negative details respectively about the good or bad characteristics of people they had read about (Forgas & Bower, 1987). This pattern was also confirmed in a study by Eich, Macaulay, and Ryan (1994), who asked happy or sad students to remember 16 specific episodes from their past. There was a clear affect congruent pattern in what they recalled.

These affect-congruent memory effects occur because an affective state can selectively activate affect-congruent information (Bower, 1981). People will actually spend longer reading and encoding affect-congruent material into a richer pre-activated network of affect-congruent memory associations. Not surprisingly, they are also better in remembering such information later on (see Bower, 1991). Affect may also direct selective attention to affectcongruent information when it is first encountered. For example, affect influences participants' attentional filter, focusing attention on faces that showed affect-congruent rather than incongruent expressions (Becker & Leinenger, 2011). Positive affect can also produce a marked attentional bias toward positive, rewarding words (Tamir & Robinson, 2007), and greater attention to positive images (Wadlinger & Isaacowitz, 2006). In contrast, depressed people pay selectively greater attention to negative information (Koster, De Raedt, Goeleven, Franck, & Crombez, 2005), negative facial expressions (Gilboa-Schechtman, Erhard-Weiss, & Jecemien, 2002), and negative behaviors (Forgas et al., 1984).

Such an affect-congruent bias has its dangers, because through selective attention to negative events, negative affect may easily spiral into a state of endur-

ing depression. Fortunately, with non-clinical subjects, this spiral is rare as sad people automatically escape the vicious circle of negativity by automatically switching to an *affect-incongruent* processing strategy after a while. For example, after initially retrieving negative memories, non-depressed participants in a negative mood spontaneously shifted to retrieving positive memories as if to lift their mood (Josephson, Singer, & Salovey, 1996).

# 18.1.5 Affect-state Dependence in Memory

Affective states also impact on memory by selectively facilitating the retrieval of information that has been learnt in a *matching* rather than a *non-matching* affective state. Such affect-state dependent memory is a special case of state dependence. We all remember information better when the same state is reinstated in which the event was first encountered. For example, a list of words learnt when you were feeling happy is more likely to be remembered when you feel happy again rather than sad at the time of retrieval (Bower, 1981). In extreme cases of state dependency, serious memory deficits can also occur in patients with alcoholic blackout, chronic depression, dissociative identity and other psychiatric disorders (Goodwin, 1974; Reus, Weingartner, & Post, 1979). Bipolar patients with intense affective fluctuations also show a marked pattern of affectstate dependence in remembering (Eich, Macaulay, & Lam, 1997).

Affect-state dependence is a rather subtle effect (Bower & Mayer, 1989; Kihlstrom, 1989), and is most likely to be found when the task requires open and constructive processing. Accordingly, affectstate dependence is more likely in constructive free recall tasks rather than in recognition tasks (Eich, 1995; Bower & Mayer, 1989), and more robust when the recalled events are self-relevant and the encoding and retrieval affect are distinctive, well matched and salient (Eich, 1995; Eich & Macauley, 2000; Ucros, 1989). There are also important individual differences between people in their susceptibility to affect congruence and state-dependent memory (Bower, 1991; Smith & Petty, 1995).

# 18.1.6 Affect Infusion in Associations and Judgments

The increased availability of affect-related information in memory should also have a marked influence on the kinds of associations and inferences people make, and subsequently, of how complex or ambiguous social information is interpreted. Bower (1981) found that after receiving a mood induction, people generated more mood-congruent ideas when daydreaming or free associating to ambiguous TAT pictures. Happy people also generated more positive than negative associations to words such as *life* (e.g., *love* and *freedom* vs. *struggle* and *death*) than did sad subjects. The selective priming and greater availability of affect-congruent ideas in memory can ultimately also influence complex social judgments, as judges also tend to rely on their most available, affect-consistent thoughts when making an interpretation of complex and ambiguous stimuli. For example, after an affect induction, judges made significantly more affect-congruent judgments when evaluating faces (Forgas, 2013; Gilboa-Schechtman et al., 2002), and they also form more affect-consistent impressions about others as well as themselves (Forgas et al., 1984; Forgas & Bower, 1987; Sedikides, 1995).

Paradoxically, affective influences on judgments tend to be greater when the targets require more constructive and elaborate processing because they are more complex and atypical (e.g., Forgas, 1992; 1995). Several studies found that the more people needed to think in order to compute a difficult and complex judgment, the greater the likelihood that their affectively primed ideas influenced the outcome. In one experiment, participants were asked to form impressions about characters who had either typical and predictable features (eg. typical medical students), or were atypical and complex (eg. a medical student who is also a hippy; Forgas, 1992). Affect had a significantly greater impact when judges had to form impressions of such complex, atypical characters (Figure 18.1).

These judgmental effects can be quite robust, even influencing judgments about very well-known people, such as a person's real-life partners. Forgas (1994) in one experiment showed that temporary affective state significantly influenced judgments about one's partner as well as real, recurring relationship conflicts. Ironically, affective influences were stronger when judgments about more complex, difficult relationship situations required longer and more constructive processing. In other words, the more one needs to think about a judgmental task,

Figure 18.1: Affect-congruence in judgments is magnified when the target is complex and unusual and so requires more constructive and extensive processing (after Forgas, 1992).

the more likely that one's prevailing affective state will come to bias the outcome. Some personality characteristics, such as high trait anxiety, may interfere with these effects, as highly anxious people are often less likely to process information in an open, constructive manner.

### 18.1.7 Affect and Self-Perception

Can fluctuating affective state also bias how we think about ourselves? It turns out that the answer is 'yes' (Sedikides, 1995). For example, students in a positive affective state are more likely to claim credit for their success in a recent exam, but are less likely to blame themselves for failure (in Forgas, 1995). These findings were replicated in a study by Detweiler-Bedell and Detweiler-Bedell (2006), who concluded that consistent with the AIM, "constructive processing accompanying most self-judgments is critical in producing mood-congruent perceptions of personal success" (p. 196). Sedikides (1995) further found that central, well-established ideas about ourselves tend to be processed more automatically and less constructively and thus are less likely to be influenced by how we happen to feel at the time. In contrast, judgments about more "peripheral" and vague self-conceptions require more substantive processing and are more influenced by a person's affective state. Long-term, enduring individual differences in self-esteem also play a role, as high selfesteem people are less influenced by their temporary affective state when judging themselves (Smith & Petty, 1995). Low self-esteem judges in turn have a less clearly defined and less stable self-concept and are more influenced by their fluctuating affective states (Brown & Mankowski, 1993).

These results are consistent with the Affect Infusion Model described previously (Forgas, 1995), and show that affectively primed thoughts and associations are more likely to influence associations and judgments when more extensive, open and constructive processing is required. Other work suggests that affect congruence in self-judgments may eventually be spontaneously corrected as people shift to a more targeted, motivated thinking style, reversing the initial affect-congruent pattern (Sedikides, 1994).

# 18.1.8 Affect Congruence in Social Behaviors

As we have seen, affective states often influence what people think. Because planning strategic social behaviors necessarily requires some degree of constructive, open information processing in calculating what to do (Heider, 1958), affect should ultimately also influence how people actually behave in social situations. Positive affective states, by activating more positive evaluations and inferences, should elicit more optimistic, positive, confident, and cooperative behaviors. In contrast, negative mood may produce more avoidant, defensive, and unfriendly behaviors. In one experiment, positive and negative affective states were induced in people (by showing them happy and sad films) before they engaged in a complex, strategic negotiation task (Forgas, 1998a). Those in a positive affective state employed more trusting, optimistic, and cooperative and less competitive negotiating strategies, and actually achieved better outcomes. Those in a negative mood were more pessimistic, competitive and ultimately, less successful in their negotiating moves (Figure 18.2).

Other kinds of social behaviors, for example, the way people chose their words when formulating a request, are also significantly influenced by how the person feels at the time (Forgas, 1999). Individuals in a negative affective state tend to make more pessimistic implicit inferences about the likely success of their requests, and so they use more polite, elaborate and cautious request forms. Positive affect has the opposite effect: it increases optimism and results in more confident and less elaborate and polite request formulations

Affect also has an impact on how people *respond* to an unexpected real-life request. In a realistic field study, students in a library were induced in a positive or negative affective state by finding folders on their desks containing affect inducing pictures and text (Forgas, 1998b). Soon afterwards they received an unexpected polite or impolite request from a passing student (actually, a confederate) asking for some stationery needed to complete an essay. There was a marked affect-congruent pattern. Negative mood resulted in more critical, negative evaluations of the request and requester, and reduced compliance, but positive mood yielded a more positive evaluation and greater willingness to help. These effects were even stronger when the request was more unexpected and impolite and so required more substantive processing.

Affect infusion can be particularly important when performing complex strategic social behaviors such as *self-disclosure* that plays a critical role in the development and maintenance of intimate relationships. By facilitating access to affect-congruent memories and associations, people in a positive affective state disclose more positive, intimate, varied, and abstract information about themselves (Forgas, 2011). Negative affect has exactly the opposite effect, resulting in less open and positive selfdisclosure. Studies such as these provide convergent evidence that temporary fluctuations in affective state can result in marked changes not only in thinking (memory, associations and judgments), but also in actual social behaviors. In other words, our affective states play an important informational function in thinking and responding to the social world. These effects are most marked when an open, constructive processing style is adopted (Forgas, 1995; 2006) that increases the scope for affectively primed information to become activated and used (Bower, 1981).

# 18.2 Affective Influences on Thinking Strategies

The evidence surveyed so far clearly shows that affect has a marked *informational* influence on the valence and *content* of our thinking, resulting in affect-congruent effects on memory, attention, associations, judgments and social behaviors. Affect also has a second effect on cognition, influencing *how* people think, that is, the *process of cognition*. This section will look at evidence for the informationprocessing consequences of affect. Early studies suggested that people in a positive affective state tend to think in a more superficial and less effortful way. Those feeling good made up their minds more quickly, used less information, tended to avoid more effortful and systematic thinking, yet, ironically, also appeared more confident about their decisions. Negative affect, in contrast, seemed to produce a more effortful, systematic, analytic and vigilant processing style (Clark & Isen, 1982; Isen, 1984; Schwarz, 1990). Positive affect can also produce distinct processing advantages as happy people tend to adopt a more creative, open, and inclusive thinking style, use broader cognitive categories, show greater mental flexibility, and perform better on secondary tasks (Bless & Fiedler, 2006; Fiedler, 2001; Frederickson, 2009).

Figure 18.2: Affect-congruent influences on negotiating strategies: positive affect promotes cooperation and making deals, negative affect promotes competition (After Forgas, 1998a).

# 18.2.1 Linking Affect to Processing Style

How can we explain such affectively induced processing differences? Early theories emphasized *motivational* factors. According to the *mood maintenance/mood repair* hypothesis, positive affect may motivate people to maintain this pleasant state by avoiding effortful activity such as elaborate thinking. In contrast, negative affect is aversive, and should motivate people to shift to a more vigilant, effortful information processing style as a useful strategy to improve their affect (Clark & Isen, 1982; Isen, 1984). A somewhat similar *cognitive tuning* account (Schwarz, 1990) proposed that affective states have a fundamental signaling/tuning function, automatically informing us about the level of vigilance and processing effort required in a given situation. Thus affective states have important adaptive and motivational functions, consistent with a functionalist/evolutionary view of affect (Dolan, 2002). However, this view has been challenged by some experiments demonstrating that positive mood does not always reduce processing effort, as performance on simultaneously presented secondary tasks is not necessarily impaired (e.g., Fiedler, 2001).

An integrative theory by Bless and Fiedler's (2006) suggests that the fundamental, evolutionary significance of affect is not simply to regulate processing *effort*, but rather to trigger equally effortful but qualitatively different *processing styles*. The model identifies two complementary adaptive functions, *assimilation* and *accommodation*, triggered by positive and negative affect, respectively (cf. Piaget, 1954). Assimilation means using existing internal knowledge to understand the world, whereas accommodation requires greater attention to new, external information to modify internal representations (Bless & Fiedler, 2006; p. 66; Piaget, 1954; see also Chapter 10, "Decision Making", on dual process theories in psychology). Positive affect signals safety and familiarity, so that existing knowledge can be relied upon. In contrast, negative affect functions as a mild alert signal, triggering more careful and accommodative processing. This processing dichotomy bears more than a passing resemblance to Kahneman's (2011) distinction between System 1

and System 2 thinking. In important ways, it appears that positive affect promotes faster, simpler, and more heuristic and creative thinking, while negative affect produces a slower, more systematic and more analytic thinking style.

Several experiments show that positive affect indeed promotes more assimilative and abstract language representations, the use of fewer and broader cognitive categories, and greater focus on the global rather than the local features of a target (Forgas, 2006; Frederickson, 2009; Gasper & Clore, 2002; Isen, 1984; Koch, Forgas, & Matovic, 2013). Further, positive affect increases, and negative affect decreases people's tendency to rely on their preexisting internal knowledge in cognitive tasks, and improves memory for self-generated information (Fiedler, Nickel, Asbeck, & Pagel, 2003). Thus, *both* positive and negative affect can confer processing advantages, albeit in response to different situations. In contrast to the dominant hedonic emphasis on the benefits of positive affect in our culture, an important implication of this model is that positive affect is not always advantageous, and negative affect can often produce distinct processing advantages, as the experiments to be reviewed next will show.

# 18.2.2 Can Negative Affect Improve Cognitive Performance?

As negative affect promotes more accommodative, externally focused processing, this should improve memory as well. In one field experiment, happy or sad shoppers (on sunny or rainy days, respectively) saw a variety of unusual small objects displayed in a local shop (Forgas, Goldenberg, & Unkelbach, 2009). Their affective state (induced by good or bad weather on that day) had a significant effect on memory. Those in a negative mood (on rainy days) had significantly better memory for the details of what they saw in the shop than did happy people (on sunny days; Figure 18.3). Laboratory experiments confirmed this pattern, as memory for the details of essays read was also significantly better in a negative compared to a positive affective state (Forgas, 2013).

Negative affect can also improve recall and reduce errors in eyewitness memory (Forgas, Vargas,

& Laham, 2005). In one experiment using a real-life incident, students witnessed a staged aggressive encounter during a lecture (Forgas et al., 2005, Exp. 2). A week later, while induced into a positive or negative affective state, witnesses received questions about the incident that included false, misleading information. Happy affect increased the tendency to assimilate these false details into memory, but negative affect eliminated this source of error in eye-witness reports. Conceptually similar results were reported by Clore and Storbeck (2006), who also found that individuals in a negative mood were significantly less likely to show false memory effects than those in positive moods, consistent with negative affect promoting more attentive and accommodative thinking. Paradoxically, even though happy affect *reduced* eye-witness accuracy, it *increased* eye-witness confidence, suggesting that witnesses had no real internal awareness of the processing consequences of their affective states.

# 18.2.3 Affective Influences on Judgmental Accuracy

Many common judgmental errors occur in everyday life because people are imperfect and often inattentive information processors (Kahneman, 2011). For

example, the *fundamental attribution error* (FAE) or *correspondence bias* refers to the pervasive tendency by people to attribute intentionality and internal causation to an actor and underestimate external, situational constraints (Gilbert & Malone, 1995). This happens because people focus on the most salient information, the actor, and ignore peripheral cues. As negative mood promotes more attentive, detailoriented processing, it should reduce the incidence of this common judgmental bias. This was confirmed in one experiment (in Forgas, 2013) where happy or sad subjects were asked to judge the attitudes of the writer of an essay that was either freely chosen, or was assigned to them. Happy persons were more likely and sad people were less likely to commit the fundamental attribution error by incorrectly attributing internal causation based on a coerced essay. Memory data confirmed that those in a negative affective state also remembered more details, consistent with accommodative processing.

Many judgmental inaccuracies are due to humans' excessive reliance on using judgmental shortcuts or heuristics (Kahneman, 2011). It seems that positive affect may increase, and negative affect reduce such judgmental biases when forming impressions. One relevant example is primacy effects, when early information about a person dominates our subsequent

Figure 18.3: Mean number of target items seen in a shop correctly remembered as a function of affective state (happy vs. sad) induced by good or bad weather (after Forgas, Goldenberg & Unkelbach, 2009).

impressions. In one experiment, participants formed impressions about a character (Jim) described in two paragraphs in either an introvert–extrovert or an extrovert–introvert sequence (Forgas, 2011). Subsequent impression-formation judgments showed that positive affect significantly increased reliance on heuristic primacy cues (relying on whatever information came first; Figure 18.4). In contrast, negative mood, by recruiting a more accommodative, System 2 processing style, almost eliminated the usual primacy effect. We should note, however, that negative affect can only improve judgmental accuracy when relevant stimulus information is actually available. Ambady and Gray (2002) found that in the absence of diagnostic details, "sadness impairs [judgmental] accuracy precisely by promoting a more deliberative information processing style" (p. 947).

# 18.2.4 Affective Influences on Stereotyping

Positive affect, by promoting assimilative thinking and the use of pre-existing knowledge in judgments, may also promote stereotyping. For example, Bodenhausen, Kramer, and Süsser (1994) found that happy participants relied more on ethnic stereotypes

when evaluating a student accused of misconduct, whereas negative mood reduced this tendency. Generally speaking, negative affect tends to promote greater attention to specific, individuating information when forming impressions of other people (Forgas, 2013). Similar effects were demonstrated in an experiment where happy or sad subjects had to form impressions about the quality of a brief philosophical essay allegedly written by a middle-aged male academic (stereotypical author) or by a young, alternative-looking female writer (atypical author). Once again, results showed that positive affect increased the judges' tendency to be influenced by irrelevant stereotypical information about the age and gender of the author. In contrast, negative affect eliminated this judgmental bias (in Forgas, 2013).

Relying on stereotyped expectations can ultimately also impact on behaviors. We tested this prediction using the 'shooters bias' paradigm assessing subliminal aggressive tendencies, where happy or sad people had to make rapid on-line decisions about whether to shoot at rapidly presented videotaped targets who did or did not appear to be holding a weapon (Correll et al., 2007). US subjects often display a strong implicit bias on this task and shoot more at Black rather than White targets (Correll et al., 2007). In our study we manipulated the im-

Figure 18.4: Primacy effects on impressions formation are increased by positive affect, and eliminated by negative affect: Judges perceive the target person as more extroverted when the extroverted description comes first, and this primacy effect is strongest in a positive rather than negative mood (vertical axis = extraversion judgments; differences between the columns indicate the size of the primacy effect; after Forgas, 2011).

ages so that some targets appeared to be Muslims, wearing a turban, while in the control condition the same person was shown without a turban.. In this case, we found a strong "turban effect", that is, Muslim targets elicited more aggression. Yet the most intriguing finding was that positive affect further *increased* this selective response tendency to shoot at muslim targets, while negative affect reduced it (Unkelbach, Forgas, & Denson, 2008). Thus, affective influences on stereotyped thinking may extend to influencing actual aggressive behaviors as well.

# 18.2.5 Affective Influences on Gullibility

Much of our knowledge about the world is based on second-hand information we receive from others that is often ambiguous and not easily verified (eg. hearsay, gossip, urban myths, fake news, conspiracy theories, trivia claims, etc.). Gullibility (accepting invalid information as true) can be just as problematic as rejecting valid information (excessive skepticism). Affective states also seem to play a role in how such decisions are made (Forgas, 2008; 2013, in press). For example, one study asked happy or sad participants to judge the probable truth of a number of urban legends and rumours (Forgas, 2018). Positive mood promoted greater gullibility for novel and unfamiliar claims, whereas negative mood promoted skepticism, consistent with a more externally focused, attentive, and accommodative thinking style. In another experiment, participants' recognition memory was tested two weeks after they were informed about the truth or falsity of various claims taken from a trivia game. Sad participants were better able to correctly distinguish between the true and false claims they had seen previously. In contrast, happy participants tended simply to rate previously seen and thus familiar statements as likely to be true (in essence, a familiarity/fluency effect). This pattern suggests that happy affect promoted reliance on the simple "what is familiar is true" heuristic, whereas negative mood conferred a clear cognitive advantage improving judges' ability to accurately remember the truth value of the statements.

# 18.2.6 Mood Effects on Bullshit Receptivity: Perceiving Meaning Where There is None

Perhaps the most striking form of gullibility occurs when people see meaning in meaningless, randomly generated information. Such absurd gullibility has been repeatedly demonstrated even in ideologically biased academic journals dealing with postmodernist theory, radical feminism and 'grievance studies'. Several such academic journals accepted for publication a number of articles composed of intentionally meaningless jargon and politically correct verbiage (Sokal & Bricmont, 1998). Pennycook et al. (2015) confirmed a similar effect, showing that people often perceive vacuous, pseudo-profound "bullshit" text as meaningful.

Can affect influence bullshit receptivity? One experiment asked participants in a positive or negative mood (after viewing cheerful or sad videotapes) to rate the meaningfulness of two kinds of verbal 'bullshit' text, including vacuous New Age pronouncements (e.g. "Good health imparts reality to subtle creativity"), and meaningless scientificsounding psychological jargon phrases (e.g. "subjective instrumental sublimations"; Forgas, Matovic, & Slater, 2018). People in a positive mood were more gullible and saw more 'meaning' in these nonsense statements than did those in the neutral and negative mood groups (see Figure 18.5). Positive mood judges were not only more gullible, but also were faster to produce a judgment, and also had worse recall and recognition memory than did those in the neutral and negative mood conditions, consistent with the prediction that positive mood produced a less attentive information processing style.

In a related study, we also looked at mood effects on bullshit receptivity using abstract visual rather than verbal stimuli. Participants in public places received a mood induction (reminiscing about positive or negative life episodes) and then judged the meaningfulness of four modern abstract expressionist paintings. Positive mood again increased the perceived meaningfulness of these abstract images compared to negative mood.

Figure 18.5: Mood effects on bullshit receptivity (seeing meaning in nonsense sentences): positive mood increased gullibility compared to neutral and negative mood (after Forgas, Matovic, & Slater, 2018).

# 18.2.7 Mood Effects on Decoding Interpersonal Messages

Interpersonal communications are often also ambiguous and have no objective truth value (Heider, 1958, see also Chapter 12, "Language and Thought"). Accepting or rejecting such messages is critically important for effective social interaction. For example, people in a negative affective state were significantly less likely than those in a positive state to believe that various facial expressions were authentic (in Forgas, 2013).Taking this line of reasoning one step further, can affective states also influence people's ability to detect deception? In one study, happy or sad participants watched videotaped interrogations of suspects accused of theft who were either guilty or not guilty (Forgas & East, 2008). As predicted, those in a positive mood were more gullible, as they accepted more denials as true. In contrast, negative affect resulted in more guilty judgments, and also improved the participants' ability to correctly identify targets who were deceptive. So negative affect not only increased overall skepticism, but improved people's ability to accurately detect deception.

Detecting ambiguity in *verbal messages* is an equally important task. In one study (Matovic,

Koch, & Forgas, 2014) participants received a mood induction (watched happy or sad films), and were next asked to detect confusing, ambiguous sentences whose meaning was unclear. Results showed that negative mood promoted the more accurate detection of verbal ambiguity, consistent with the adoption of a more accommodative processing style. This was also confirmed by more extensive processing, and the more accurate recall when in a negative mood (Figure 18.6).

### 18.2.8 Affective Influences on Behavior

Our behavioral strategies may also benefit when negative affect triggers a more thorough processing style. To take one example, negative affect may optimize the way people process, produce, and respond to persuasive messages. In a number of studies, participants in a negative affective state were more sensitive to message quality, and were more persuaded by strong rather than weak arguments. In contrast, those in a positive affective state were not influenced by message quality, and were equally persuaded by strong and weak arguments (e.g., Sinclair, Mark, & Clore, 1994). Affective states may also influence the *production* and quality of persuasive messages. Those experiencing induced negative af-

Figure 18.6: The effects of positive and negative mood on (a) the ability to correctly identify ambiguous sentences (left panel), (b) the time taken to process the task (middle panel), and (c) the ability to remember the target sentences (right panel; after Matovic et al., 2014).

fect produced significantly higher quality and more effective persuasive arguments on topical issues than people in a positive state (Forgas, 2013). Negative affect also resulted in identifiable benefits when performing demanding interpersonal tasks, such as ingratiation (Forgas, Matovic, & Slater, 2018), consistent with the adoption of a more externally oriented, concrete processing style (Bless & Fiedler, 2006; Fiedler, 2001). Overall, participants in a negative mood perform significantly better in complex communication tasks, and are less likely to violate the rules of effective communication compared to those in a positive affective state (Koch, Forgas, & Matovic, 2013).

Decisions about the way we actually treat others may also be influenced by affective states. For example, affect was found to influence the degree of *selfishness* versus *fairness* when people allocate resources amongst themselves and others in strategic games, such as the dictator game (Tan & Forgas, 2010). Positive affect, by increasing internally focused, assimilative processing resulted in more selfish allocations. Negative affect, in contrast, focusing greater attention on external information such

as the norm of fairness, produced significantly more generous and fair allocations in a series of decisions.

### 18.3 Conclusions

Understanding how affect influences thinking remains one of the most fascinating questions in psychology, an issue that has also occupied philosophers since time immemorial. Recent neuropsychological research suggests that these two fundamental human faculties, feeling and thinking, operate in close interdependence, with affect playing an evolutionary signalling role alerting the organism to significant events in the environment (Dolan, 2002). This chapter reviewed experimental evidence that broadly confirms this view, and suggested that the role of affect on thinking can be classified into two major kinds of influence. *Informational effects* impact on the content and valence (positivity vs. negativity) of thinking usually resulting in affect congruence. *Processing effects* occur because affective states trigger qualitatively different, more or less assimilative vs. accommodative processing strategies.

The evidence reviewed here highlights the potentially adaptive and beneficial processing conse-

#### Conclusions Forgas

quences of both positive and negative affective states. Contrary to the popular preoccupation with the universal desirability of positive affect in Western culture, the research shows that negative affect can often produce important adaptive advantages, improving memory, judgments and behavioral strategies (Forgas, 2013; in press). The implication is that our persistent and unilateral emphasis on positivity and happiness may be misplaced; instead, both negative and positive affect should be accepted as a normal part of human functioning (see also Chapter 19, "Culture and Thought"). Of course, intense and enduring negative affective states such as depression can be hugely debilitating, and require clinical intervention.

In summary, there is now clear evidence that affective states have a powerful, yet often subconscious influence on *what* people think (content effects) as well as *how* people think (processing effects). These effects are often subtle and subject to a variety of boundary conditions and contextual influences. A better understanding of the complex interplay between affect and cognition remains one of the most important tasks for psychology as a science. A great deal has been achieved in the last few decades, but in a sense, the enterprise has barely begun. Hopefully this chapter will contribute to a better understanding of the fascinating relationship between affect and cognition.

#### Summary


#### Review Questions


#### Hot Topic

The last few years produced genuine insights into the influence of affective states on thinking. The current research project seeks to extend this work into two new directions. First, several experiments investigate affective influences on the way people communicate, including the sending and decoding of both verbal, and nonverbal messages. Recently completed experiments showed that paradoxically, mild negative affective states seem to promote a more attentive and externally oriented information processing style that results in more competent and successful communication strategies. For example, participants in a negative affective state were better at both producing, and dealing with ingratiating messages, and they were also better at constructing more effective verbal messages in compliance with normative conversational requirements (Matovic & Forgas, 2018). In another ongoing experiment, we

Joseph Forgas are also looking at the influence of affective states on verbal creativity. For example, we are asking happy or sad participants to produce suitable captions to various cartoon drawings, or formulate verbal responses in conflict situations, and the quality of their responses will be evaluated.

The second line of research explores how affective states influence judgments involving gullibility vs. scepticism. In particular, we are interested in the possibility that negative affect may reduce gullibility and increase skepticism. In a post-truth age of 'fake news' and the widespread use of manipulative misinformation both in commerce and in public life, understanding what factors promote critical thinking is of great practical importance. Several of our earlier experiments suggested that negative affect can reduce people's susceptibility to misleading information in their eyewitness memories (Forgas, Vargas, & Laham, 2005). Further, negative affect also reduced the 'truth bias', the tendency to believe as true ambiguous information simply because it happens to be salient and can be processed more easily (Koch & Forgas, 2012). Following on from this work, our recent studies looked at the phenomenon of 'bullshit receptivity'—the tendency for people to believe that meaningless, randomly generate gibberish text is actually meaningful. We used randomly generated New Age pronouncements from the work of Deepak Chopra, a New Age guru as the stimuli, as well as randomly generated psychological jargon terms. We found that participants who were induced into a positive affective state (after watching cheerful, happy videos) were significantly more gullible and showed higher 'bullshit receptivity' than those in a negative affective state.

In a companion experiment, we asked happy and sad participants to judge the meaningfulness of various abstract expressionist paintings. Again, positive affect increased and negative affect reduced their willingness to perceive meaning in these images. Further studies will look at the reasons why these effects occur. For example, the universal human tendency to seek and find patterns in otherwise random information may also be influenced by affect. The evolutionary significance of these mild, but reliable affective influences on how we see and evaluate complex information will also be explored. The role of affective states in promoting or inhibiting mental flexibility—the ability to see multiple meanings in ambiguous information—will also be studied, as a step towards better understanding the role of affect in why people often accept dubious information.

#### References


#### References


*cial cognition and behaviour* (pp. 65–84). New York: Psychology Press.


*Social Psychology*, *64*, 421–430. doi:10.1037//0022- 3514.64.3.421


tion. *Journal of Abnormal and Social Psychology*, *55*, 283–288. doi:10.1037/h0042811


Mayer, J. D., Gaschke, Y. N., Braverman, D. L., & Evans, T. W. (1992). Mood-congruent judgment is a general effect. *Journal of Personality and Social Psychology*, *63*, 119–132. doi:10.1037/0022-3514.63.1.119


*Personality and Social Psychology*, *69*, 759–777. doi:10.1037//0022-3514.69.4.759


# Glossary


to infer their reaction rather than computing a response based on the actual features of the stimulus. 342


# Chapter 19

# Culture and Thought

#### MARY GAUVAIN

University of California, Riverside

Throughout the day, people solve many different types of problems. The nature of these problems and the way that people understand and think about them can have enormous consequences for individuals and their well-being. Psychologists have great interest in this process and one thing is increasingly clear—in order to understand human thinking it is necessary to take culture into account (Greenfield, Keller, Fuligni, & Maynard, 2003). This insight is based on research that shows that culture, the natural environment or habitat of the human species, is an essential and inextricable part of human psychological experience, including thought.

This chapter describes the relation between culture and thought. It begins with a brief historical account of how culture has been studied in psychological research on human cognition. We then describe how culture becomes part of individual mental functioning. Throughout the chapter, the focus is on both the content and process of human cognition. Content includes behaviors and other psychological properties, such as knowledge. Process is about how thinking works and includes mental functions such as attention, perception, reasoning, classification, memory, problem solving, and planning. Culture plays a significant role in determining both the content and the process of human thinking.

To illustrate these ideas, findings from research in the area of spatial cognition, the understanding and

use of space, are described. Navigating in and using large-scale space effectively are critical to the everyday functioning and the survival of all human beings. The importance of spatial knowledge, along with variations across cultural settings in the environment and the resources available for understanding and using space, make this a rich area to study culture and thought (Dasen & Mishra, 2010). What is clear from this research is that, across cultures, there exists a vast array of solutions for solving spatial problems and they affect how people explore, learn about, and remember the world around them. To help people solve spatial problems, cultures, over the course of human history, have devised various social conventions (e.g., ways of describing space, teaching people about how to understand and use space) and symbolic and material ways of encoding and representing spatial information (e.g., maps, models, compasses, frames of reference). These cultural tools are used to solve spatial problems including how people communicate spatial information (e.g., directions), identify locations, orient themselves in space, and find their way around. These cognitive skills and the practices associated with them are highly valued in cultures and, as such, they are shared by community members and passed across generations in the process of cognitive socialization (Gauvain & Perez, 2015b).

# 19.1 A Brief Historical Look at Psychological Research on Culture and Cognition

Psychologists have been interested in the relation between culture and human cognition for well over a century. At the very beginning of the discipline in the late 1800s, Wilhelm Wundt, a founder of modern psychology, was concerned with how culturalhistorical forms, such as language and methods of reasoning, affect cognitive functions (Cole, 1996). At the same time, Wundt and other psychologists were also committed to studying human psychology experimentally, an approach to research that makes it very difficult to study culture. This is because two principal features of the experimental method, random assignment and manipulation, cannot be used—a person cannot be randomly assigned to a culture nor can culture be experimentally manipulated (Whiting, 1976). In short order, the attention of these early researchers landed on topics better suited to experimentation, such as physiological and perceptual psychology. As a result, in the early 20th century, the study of culture and human cognition, at least among psychologists in the U.S. and Europe, declined significantly. Interestingly, at this same time, there was strong interest in Russia where Lev S. Vygotsky and other Activity Theorists were putting forward exciting ideas about culture and cognition, many of which are taken up later in the chapter when the sociocultural approach is described (Wertsch, 1985).

By the mid-20th century in American and European psychology, there was renewed interest in culture and cognition. It was fueled, in part, by the "cognitive revolution" occurring in psychology at the time (Bruner, 1957; Neisser, 1967) along with a number of practical concerns that had great societal significance. Of particular importance was the need to understand cognitive variation in human performance on studies that included individuals from different social or cultural backgrounds (Munroe & Gauvain, 2010). Some researchers observed that cognitive performance varied systematically with participants' social class and their experience with Western forms of schooling (e.g., see Cole, Gay, Glick, & Sharp, 1971). Interestingly, at the same time, the research participants, both children and adults, who had performed poorly on conventional laboratory assessments of cognition were observed using impressive cognitive capabilities in their daily lives, including spatial knowledge, reasoning, classification, and linguistic and number systems (e.g., see Gladwin, 1970; Hutchins, 1983; Lancy, 1983; Serpell, 1979). Moreover, these skilled performances resonated closely with the practices and values of the participants' cultural group.

These observations provided understanding that may seem obvious in hindsight, but were at the time quite profound. First, they suggested that human cognitive performance is better when it is assessed on the activities and skills that people practice and are valued in their culture. Second, the more a cognitive assessment deviates from the familiar context in which an individual lives, the poorer the person's cognitive performance will be. Third, because the patterns were similar for children and adults, the connection between culture and cognition exists throughout the lifespan. And, finally, results that demonstrate better cognitive performance in people who live in Western, more industrialized cultures are often based on assessments that favor their experiences. In many cases, they reflect the cultural background and values of the researchers themselves. When taken together, these observations set the stage for a new generation of research on culture and human cognition, one based on the idea that experience in culture is fundamental to the development and expression of human thinking.

Since that time, two different approaches to studying culture and cognition have been used (Göncü & Gauvain, 2012; Table 19.1). One, the cross-cultural approach, focuses on comparisons across cultures, while the other approach, based on the area of research known as cultural psychology, concentrates on processes and systems of meaning within cultures. Each of these approaches has strengths and limitations.

For Margaret Mead (1931), a founder of the crosscultural tradition, this approach is essentially a type of experimental research design, one that investigates how natural variations in culture affect the human experience. Despite this hopeful point of


Table 19.1: Contemporary approaches for studying culture and cognition.

view, the approach has, in practice, fallen short of this goal. Most significantly, it is prone to biases that favor one cultural group, typically the one similar to the researchers' own background, over other groups. Also, over time, research based on this approach resulted in a number of unsubstantiated assumptions about universality, most often by identifying the performances of Western middle-class participants living in industrialized communities as normative or optimal and applying deficit interpretations to participants whose performances do not match up (e.g., Cole et al., 1971; Rogoff, 2003; Serpell, 2017; Shweder, 1990). Studies of withinnation cultural differences that use this method, such as research conducted in the U.S. when children from low-income communities are compared with their middle-class counterparts, have often been similarly flawed when commonalities between groups are overlooked and differences are interpreted as the deficits of low-income children (Cole, 1996; Rogoff, 2003).

The cultural psychology approach was, in part, developed to address these limitations (Göncü, Tuermer, Jain, & Johnson, 1999; Shweder et al., 1998). It avoids cross-cultural comparisons and takes issue with the use of one culture as the standard or norm in such comparisons. Rather, it views culture as an inherently integrated system of meaning that provides organization and direction for human cognition and learning. In this view, culture is psychologically experienced and takes form in individual thinking and behavior. Research based on this approach has concentrated on how cultural meanings are expressed and communicated in the day-to-day functioning of

community members through the customary practices, values, and beliefs of the group (Goodnow, Miller, & Kessel, 1995; Shweder at al., 1998). Children, over the course of development, are socialized into these traditions, values, and practices through their participation in regular events and activities (Rogoff, 2003). Cultural knowledge and ways of thinking are conveyed to young and new community members socially, both through direct social contact (i.e., social interaction) and less socially direct, but nonetheless, social forms of information exchange such as rituals, customs, and shared tools and resources, including technology (Gauvain & Nicolaides, 2015).

Some contemporary researchers working from this approach are called sociocultural or sociohistorical psychologists and they base many of their ideas on the aforementioned insights of Vygotsky and Activity Theorists (Cole, 1996; Vygotsky, 1978). Sociocultural approaches hold the view that human thinking is culturally mediated, that is, it takes place in historically-situated activities that are informed and guided by culture. Culture becomes part of individual psychological experience as people engage in the practices, institutions and tools in settings where the accumulated knowledge of the culture is used and made available to new members. Over the last decades, this view has helped shift attention away from a view of human cognition as a solitary, individual, and internally driven process towards one that sees cognition as emerging from the coordination of inherent human abilities and cultural systems of meaning.

This chapter draws on empirical evidence from both the cross-cultural and within-culture research traditions. As stated, each approach has strengths and they can be used in a complementary way to guide theory and research (Van de Vijver, Hofer, & Chasiotis, 2010). That said, each approach also has limitations. The ultimate goal is to take culture into account by benefitting from the unique insights each approach can offer while avoiding problems associated with their earlier use and interpretation. For instance, cross-cultural research can be useful when researchers do not assign greater value or worth to any cultural pattern or behavior. Focusing on a common point of reference across cultures, such as behaviors related to universal developmental and cognitive tasks (e.g., early dependency on caregivers, spatial navigation; Van de Vijver, et al., 2010), is particularly useful. Careful sampling and data analysis are critical in order to avoid ethnocentrism that reifies any particular way of life.

Research rooted in cultural psychology can provide depth of understanding about a culture. However, it is important not to adopt a monolithic view of a culture that suggests that all its members adhere to cultural values and practices in the exact same way and to the same extent. There is variation in cognition and behavior both within and across cultures. Individual differences within cultures stem from many sources including age, interests, capabilities, and other aspects of psychological functioning, such as emotionality. These variations provide one of the sources of complexity inherent to culture, which contributes in important ways to the diversity of thinking that can help a culture address new and unexpected challenges (D'Andrade, 1984; Goodnow, 1990).

To summarize, for over a century there has been interest among psychologists in the relation between culture and human cognition. After many years of research, several interesting ideas have taken shape about how to conceptualize and approach this topic. Research has made it clear that cognition has complex and deep connections to the cultural context in which an individual lives. This is because the cultural context provides the social processes, tools, practices, and institutions that support and guide cognition and its development (Gauvain & Perez,

2015a). In considering research on culture and thought, it is also important to understand that cultures are not static. They change over time as people and their environments change. And, lastly, it is worth remembering that human beings may belong to and move between many different cultures, or systems of meanings, at the same time—a phenomenon that is increasingly evident today in the context of widespread globalization.

# 19.2 Defining the Relation of Culture and Cognition

Human beings learn to think about and solve problems in their everyday lives with the support and guidance of practices and resources that have been developed by their culture over time, continue to be used, and are passed across generations. This type of social learning is called cumulative cultural evolution (Boyd & Richardson, 1996). It is the process that enables human beings to create resources and tools that support and extend human activity, including thought processes, and for these resources and tools to be used by subsequent generations in the same or a modified form. These modifications, referred to as the ratchet effect, are maintained by culture and they enable the accumulation of modifications over time. As Tomasello (1999) explains

"some individual or group of individuals first invented a primitive version of [an] artifact or practice, and then some later user or users made a modification, an 'improvement,' that others then adopted perhaps without change for many generations, at which point some other individual or group of individuals made another modification, which was then learned and used by others, and so on over historical time in what has sometimes been dubbed 'the ratchet effect' [3, p. 5].

As this quotation makes clear, human beings are active agents in this process as they adopt and adapt cultural practices and ways of thinking to meet their current needs (Tomasello, 1999).

Few would dispute the fact that the content of thought varies across cultures. Less clear is what it means to state that processes of cognition, such as attention and memory, differ across cultures. It is important to understand that this is not the same thing as saying that different groups of human beings possess fundamentally different intellectual functions. Basic intellectual functions are shared across cultures and attest to our integrity as a species. All human beings perceive stimuli, remember things, solve problems, engage in social interaction, develop and use tools to support human activity, are self-aware and so forth. However, social and cultural experiences contribute to the form these processes take in any particular instance or setting. As a result, for any given psychological function there are both commonalities and differences across cultural communities.

Consider an example from color perception. Because all intact human brains have the same visual system and photoreceptors, color perception is, as far as we know, invariant across members of the species and emerges on a similar developmental course in early infancy (Franklin, Piling, & Davies, 2005). However, cultural and linguistic experience determine a number of factors related to color perception and categorization. The number of colors identified by a single color term, how hue is classified, and the valence or preference for certain colors varies across cultures in relation to the words used in the language to denote and categorize colors (Johnson & Hannon, 2015). And, although some languages possess more color terms than others, the sequence in which new terms are added to the lan-

guage appears to be uniform (Rosch, 1977). Thus, both universal and culturally specific patterns in the perception and classification of color have been found. Such patterns suggest that even in basic cognitive processes such as color perception, we see cultural variations on a common theme.

# 19.3 Thinking in Niches

One way to trace out the cultural contributions to human thinking is to identify the means by which culture becomes part of an individual's knowledge and thought processes. To describe this process, Gauvain (1995) built on ideas put forth by Super and Harkness (1986) in their conception of the developmental niche. In their approach, Super and Harkness adapted a concept from biological ecology, the ecological niche, to describe in a single framework how social-psychological experience connects directly to culture over the course of human development. Super and Harkness identified three subsystems of the developmental niche: the physical and social settings of development, customs of child care, and the psychology of caregivers.

In extending this idea to describe human cognition and its development, Gauvain (1995) identified three subsystems of culture: (1) conventions for organizing and communicating knowledge, (2) material and symbolic tools that facilitate thinking and problem solving, and (3) cultural practices and social institutions (Table 19.2). Each of these subsystems relies

Table 19.2: Subsystems of culture that contribute to human knowledge and thought processes.


in important ways on social interaction as a primary means by which culture and cognition become connected to one another. However, each also includes less interpersonally direct, but still fundamentally social processes, that contribute to the acquisition, organization, and use of cognitive functions through the use of historically, or culturally, formulated tools and resources for understanding the world and solving problems. In this section, these three subsystems are described and illustrated with research on spatial cognition.

# 19.3.1 Conventions for Organizing and Communicating Knowledge

An important aspect of human cognition is organizing and communicating knowledge in understandable ways to others. These skills not only help people structure their knowledge for effective use, they also connect members of a community to one another. Examples are schema and scripts, which are abstract representations that connect pieces of information into an overarching organization (Bobrow & Norman, 1975; Nelson, 1993; Schank & Abelson, 1977). Scripts, for example, include the order or sequence in which actions are expected to happen and how one should behave in a situation (e.g., going to a restaurant). Even infants and toddlers organize their knowledge of routine events, such as bathing, along script-like lines. By the end of the first year, infants use temporal information in recalling events such as Teddy Bear's bath: first put Teddy in the tub, then wash him with a sponge, then dry him with a towel (Bauer et al., 2000). By 20-months of age, if toddlers are told about a familiar event in which the order of actions is violated, they will correct it (e.g., "No, wash Teddy before drying him") or say, "That's so silly." These ways of organizing complex information are valuable to cognitive functioning. They support memory by aiding recall of events and they can be used to plan or guide behaviors to reach a goal, for example, what to do to get ready for school or work in the morning. And, similar to routinized actions or habits, schema and scripts aid learning and problem solving by freeing up mental space for new or challenging activities.

There are a number of examples of organizing and communicating spatial information that reveal culture contributions to this process. Research conducted in Western cultural settings has found that when adults describe spatial information, they tend to use structured narratives that resemble route-like directions that include the temporal and spatial contiguity, or relatedness, of areas in the space, almost as if someone is taking an imagined walk through it or what is called a "mental tour" (Linde & Labov, 1975). From early to middle childhood, children's descriptions of large-scale space come to resemble this type of mental tour (Gauvain & Rogoff, 1989). However, cultural values determine which information is important to include and this information is found in descriptions produced even by young children. For example, the route directions of Iranian preschoolers living in Britain include more vivid and fuller accounts of sites along a route and less directional information than the directions of same age British children living in the same region (Spencer & Darvizeh, 1983). This difference suggests that as early as three years of age, children are beginning to display some of the values of their culture when communicating spatial information to others.

There is also evidence that cultural ways of communicating spatial information affect the process of thinking about space and wayfinding (Peterson, Nadel, Bloom, & Garrett, 1996). In some languages absolute directions are used to describe spatial relations. The Guugu Yimithirr are a case in point. They are an Aboriginal community in eastern Australia and the language these people use to describe spatial relations does not rely on relativistic terms, such as left, right, up, and down (Levinson, 1996). Rather, they describe spatial information in absolute terms in accord with cardinal directions, such as north, south, east, and west. In a series of studies that involved asking speakers of this language to point to out-of-sight locations (called dead reckoning) in the desert and to reproduce the arrangements of objects on table tops in adjacent rooms, Guugu Yimithirr speakers identified and reconstructed spatial information according to the absolute rather than the relative positioning of objects. Thus, even when they were not speaking, they behaved in ways consistent with the communicative conventions in their culture for describing space. The rapidity and precision with which the participants provided absolute spatial information on these tasks led Levinson to conclude that their spatial encoding reflected an orientation consistent with the linguistic form. Although examples of this sort are rare, similar communicative and cognitive systems have been found in other cultures, such as the Tzeltal Maya (Levinson, 2003) and Tongans in Polynesia (Bennardo, 2014).

# 19.3.2 Material and Symbolic Tools That Aid Thinking

Material and symbolic tools and resources are developed and used by cultures to guide and support mental activity and, as such, they play a central role in the development and organization of cognitive skill. This view, developed by Vygotsky (1987) and other Activity Theorists (Wertsch, 1981), suggests that tools and symbols mediate the origin and conduct of human activity and, thereby, connect the human mind not only with the world of places and objects but also with other people and their cultural history. Thus, by acquiring and using culturally developed tools for thinking, a person's mental functioning assumes a link to sociohistorical means and understanding transmitted through these tools and symbols. Cole and Griffin (1980) refer to these tools and symbols as cultural amplifiers, that is, techniques or technological features provided by a culture that alter the approaches individual cultural members use in solving problems posed by their environment.

Material and symbolic tools play an important role in spatial thinking because they extend cognitive capabilities by allowing people to describe and use large-scale space in ways that would not be possible without the tools. That is, these tools not only aid thinking, e.g. by easing navigation and travel, they also transform thinking and behavior. For example, an individual may attend to and remember directions to a location differently depending on whether pencil and paper or GPS technology is at hand. In this way, the availability of tools determines how individuals attend to and store information, in other words, the very cognitive processes that are

used in carrying out an activity and in learning about the world.

The mostly widely studied cultural tool of spatial thinking is the map, which functions as both a memory store and a tool for action. Children's skill at devising, understanding, and using maps increases from early to middle childhood (Liben & Downs, 2015). Research shows that preschool children have a basic understanding of what maps represent (e.g., they understand that maps depict locations) and how they can be used (e.g., to find a place in space), but they misunderstand many of the symbolic aspects of maps (e.g., expect that a road shown as red on a map is actually red; Liben, 2009). It is not until middle childhood, when children are formally introduced to maps in school, that they begin to develop a more sophisticated understanding of maps (Uttal, 2005). Full competence at reading and using maps may not be achieved until adolescence or later depending on the opportunities available for developing these skills (Presson, 1987). Some very important or highly specialized maps, such as those representing the location of secret and valuable places (e.g., water sources) that are carved on weapons, rocks and the human body by the Ngatajara people of the Australian desert (Gould, 1969) or maps representing state or national electric grid systems, may be inaccessible to most people in a culture.

How does experience with maps relate to cognition? Research shows that this experience helps people obtain insights about large-scale space that would not otherwise be possible (Liben, 2001). It also suggests that people's ability to use maps not only reflects their particular spatial representational skills, but also the individual's experience and practice with a system of representation or tools available in their culture. Or as Uttal (2005) put it, skill at using maps to navigate in space results from living in a map-immersed culture. Because learning how to understand and use maps is a social and communicative process, people need to be taught what representations in maps stand for and how they can be used. Such skills are highly valued in cultures with these tools. In fact, recent innovations in STEM (Science, Technology, Engineering, and Mathematics) learning include introducing young people in such cultures to map use across a diverse range of

spatial contexts and technologies (Committee on Support for Thinking Spatially, 2006).

Cultural symbol systems, such as numeracy and language, also contribute to spatial thinking. Much of the research that examines language in relation to spatial cognition is centered on testing the idea proposed by Whorf (1956) that language affects the ways in which speakers conceptualize the world and even their nonlinguistic cognitive abilities. Results suggest that variation across languages in the categorization of spatial concepts contributes to cultural variation in spatial understanding. For instance, research conducted by Bowerman and colleagues (e.g., Bowerman & Choi, 2003; Majid, Bowerman, Kita, Haun, & Levinson, 2004) found that culturally specific reading patterns can influence performance on seemingly unrelated tasks. In one study, participants spoke and read either English or Mandarin; English text is written in a left-right pattern, whereas Mandarin text is written vertically. When participants were asked to described how they thought about the future, English readers described the future as occurring in a forward direction and the past in a backward direction while Mandarin readers described the future as occurring in an upward manner and the past in a downward manner.

Research has also found that language is related to cultural differences in preferences for particular frames of reference in describing space. Majid and colleagues (2004) identified three frames of reference: (1) relative, which involves use of the viewers' own perspective (e.g., the spoon is to the right of the fork); (2) absolute, which uses an external framework (e.g., the spoon is to the north of the fork); and (3) intrinsic, which uses the relationship of the items themselves without reference to personal or external coordinates (e.g., the fork is at the nose of the spoon). The frequency of using these frames of reference differs across languages. English speakers are more likely to use relative and intrinsic frames while the aforementioned Guugu Yimithirr speakers from Australia exclusively use absolute frames of reference. Similarly, Haun, Rapold, Janzen, and Levinson (2011) found that Dutch and Namibian elementary school children (6=Akhoe Haikom speakers) also differed in their spatial frames of reference. Dutch children were more likely to use relative descriptions, whereas Namibian children were more likely to use absolute descriptions. In addition, when the children were instructed to use their nondominant frame of reference, they had great difficulty in doing so and performed poorly. Thus, spatial cognition and language variability across cultures covary in systematic ways.

The symbols and tools that cultures devise and use to represent and support thinking are not static. They change over time and may do so in a rather sweeping fashion. Recently, there have been a number of major changes in the tools people use to imagine, communicate about, and experience large-scale or geo-space, including geographic information systems (GIS), global positioning systems (GPS), and geo-visualization tools (GeoVis). Downs (2014) describes these changes as revolutionary because of their potential to affect the development and use of spatial cognition along with people's understanding of and relation to the world as a whole. The extent of the impact is, as of yet, unknown. What is known is that people are adopting these technologies at a rapid pace and their use is both widespread and regular. People use handheld spatial navigation devices on a daily basis for moving around the world in vehicles and on foot. Even people living in geographically isolated communities in the Majority World use these tools, accessed mainly on mobile or cell phones (Mpogole, Usanga, & Tedre, 2008). Although most people in remote regions report purchasing these phones for social and emergency contact, the phones are also used to help people carry out activities that are spatial in nature. For instance, they help rural villagers living in very spread-out regions make decisions important for their livelihood, such as where to find clean water for livestock and household use.

Downs (2014) identifies some potential downsides to adopting these technologies that warrant more attention from researchers. For instance, he asks, how do people evaluate the quality and utility of the spatial information provided by these technologies? Do people monitor their activities as they rely on this information to be certain it is helpful or correct? Downs is also concerned about dependency. These tools, without question, can afford greater ease and flexibility for people when traveling, especially in distant or unfamiliar places. Yet users may become dependent on them, which may, in turn, lead to an abandonment of more traditional methods of thinking about and using space. These changes would, inevitably, reduce the likelihood that traditional methods of spatial thinking and representation are transmitted across generations.

Taken together, this research supports the view that symbolic and material tools devised and used by a culture are integrated with the development and use of spatial thinking skills. These cultural tools alter how individuals solve spatial problems, and as a result, they transform spatial cognition. However, their contribution to spatial thinking is complex and provides both opportunities and constraints. Tools, such as maps, and symbolic systems, including language, can provide ways of solving spatial problems that would not be possible without these resources. However, at the same time, these tools constrain spatial problem solving and what people know about space. For instance, people's understanding of the geography of London is more reflective of the spatial layout depicted in the map of the city's underground subway system than it is of the city itself (Roberts, 2005). Here we are reminded of our earlier discussion about how to interpret an individual's success or failure when asked to solve a problem or do a cognitive task. The body of research just described demonstrates that when a person is asked to solve a spatial problem that is integrated with a cultural tool, symbolic or material, the person's performance will reflect not only the individual's inherent cognitive skills, but also their experience with the symbols and tools of their culture.

# 19.3.3 Cultural Practices and Institutions

Culture provides institutions and other formal and informal social settings and arrangements, including rituals and routines, that facilitate and guide human thinking (Goodnow et al., 1995). Formal institutions are designed to train people in the valued skills and practices of their culture. School, for instance, promotes and supports the development of particular approaches and methods that are valued in the culture, such as literacy and numeracy (Serpell &

Hatano, 1997). The relation between schooling and cognitive development is well known. What is important for present purposes is how experience in school includes practice and skill development in culturally-valued areas and that these experiences carry over into everyday thinking. For instance, schooling contributes to the development of spatial thinking through the skills that are emphasized and practiced there. The types of measurement and precision promoted in schools is evident in the degree of accuracy seen or expected in people's everyday distance estimation, model replication, and map use in cultures that value these skills. This degree of precision is less common in spatial representations and memory among people living in some other cultural communities, even though these individuals exhibit high levels of spatial skill (Gauvain, 1998). Other highly skilled ways of characterizing space may emphasize configurational information (where places are relative to one another) or information about changing landscape conditions (due to seasonal or other types of climatic factors) that can alter the texture and dimension of a terrain and affect travel time or safety.

Culture may also influence spatial memory and use through more formalized traditional practices for exploring and traversing large-scale space. Traditional Puluwat seafarers in Micronesia have developed a navigational system that does not rely on modern instruments. Rather, these navigators learn a complex set of principles to guide their travels (Gladwin, 1971; Hutchins, 1983). Some of this information is directly observed, such as wave patterns, and other parts are inferred, such as the sidereal (star) compass. The sidereal compass is an abstract mental reference system of 32 star paths that defines the courses or routes of travel among islands. This huge memorization task is eased by the use of cultural myths as mnemonics or memory aids (Hage, 1978). The remarkable skill of traditional Puluwat navigators relies on knowing many star paths that define courses of travel among islands. Similar to most knowledge of familiar local space, star paths are not fixed map routes or action sequences, rather they are a reservoir of possible action plans for solving spatial navigational problems. Locomotion, either real or imagined, provides information about landmarks

and actual or potential routes, as well as immediate cues (e.g., direction, winds, tides, currents, bird patterns) that are used to update and adjust spatial orientation and route finding in real time.

Other institutions of culture, such as rituals and routines, also play important roles in cultural learning. By definition, rituals and routines entail unchanging and prescribed patterns or sequences of actions that are deemed important in the culture (Goodnow et al., 1995). These action sequences are displayed on a regular and predictable basis, and as such, children have ample opportunity to learn about them via observational and participatory means. Children also learn about their cultural significance, often in the context of family life, which enhances motivation to learn about them and carry them out (Fiese, 2006). Even early in life, children have a role in cultural rituals and routines and their role changes with development, typically in the direction of increased expectations of independent performance and responsibility (Rogoff, 2003).

Do cultural practices affect the development of spatial thinking skills? In a study comparing the spatial skills of Australian Aboriginal children reared in the desert and European Australian children reared in the city, Kearins (1981) found that the Aboriginal children performed far better on all the spatial location tasks presented to them. This result echoes the consistent finding that increased experience in an environment enhances memory for space and aids spatial orientation (Liben & Christensen, 2010). Cultures differ in the opportunity children have to explore space during everyday routine activities, which has consequences for spatial thinking and its development. For example, research conducted in the Logoli community in Kenya found a relation between the distance children played from their village and their skill on spatial tasks (Munroe & Munroe, 1971). Children's directed distance from home, that is travel undertaken while engaging in an activity away from the home area (e.g., herding, running errands to neighboring villages, weeding crops in the field) and not free-time distance from home (e.g., playing in non-adult defined or directed activities) was the important contributor to spatial skill on several tasks (Munroe, Munroe, & Brasher, 1985).

Less formal social institutions and social settings also influence spatial thinking. In cultures where verbal explanation is highly valued, cultural practices reflect this value in the form of oral narratives and storytelling. These practices assume much importance and are part of everyday experience and cognitive exchange that children have with older children and adults (Heath, 1983). For example, research shows that children are introduced to and learn about cultural ways of conceptualizing and representing space and how to use these representational forms by interacting with their caregivers. Szechter and Liben (2004) found that mothers' use of spatial language during picture book reading with 3- to 5-year-old children predicted children's success on a spatial task that involved spatial-graphic representations (i.e., understanding of graphically depicted distance). Adults also guide children in exploring new environments and they help children learn spatial routes of travel (Spencer & Blades, 2006).

Researchers have also studied how variation in cultural practice, such as access to aerial views of the earth, relate to how individuals come to understand and solve spatial problems (Blaut, Mc-Cleary, & Blaut, 1970, Spencer & Blades, 2006). Hund, Schmettow and Noordzij (2012) discuss two wayfinding strategies or perspectives: (1) route perspectives, or first-person mental tours, that provide information such as left and right turns and landmark descriptions; and (2) survey perspectives, or third-person perspectives that involve considering the entire travel space at once (e.g., aerial views) and use cardinal directions (e.g., north, south), precise distances, and specific locations. The researchers found that participants from the Midwestern United States tended to use a survey perspective whereas participants from the Netherlands tended to use a route perspective. In explanation, the researchers considered the ecological factors of the two regions. Whereas the Midwestern United States is characterized by grid-like property boundaries, the Netherlands uses more natural features to define boundaries. Thus, spatial frame of reference is shaped by the confluence of experience in the environment and cultural conventions that have been developed over time for describing a space. These conventions take

time to learn and this learning relies on guidance and support from others in the community.

Finally, although directional information in language may seem clear, research indicates that it is not possible to know which directional framework a person is using from the literal meaning of a directional term. Frake (1980) describes how one needs to understand cultural practices to interpret absolute directions (e.g., north, south, east west) and contingent directions (e.g., left-right, forward-behind). For instance, in traditional navigation in Southeast Asia, 'south' is often used to refer to 'seaward' rather than 'landward', not to true south. If this seems puzzling, consider a more familiar example. California has a jagged coastline and the Pacific Ocean is in many

places actually to the north or south. Nonetheless, the ocean is conventionally described as being to the west. In both examples, the terms 'south' and 'west' are not veridical, or true, descriptions of the world, but rather concepts or ideas for referring the world within a particular cultural frame of reference or practice. In order to know what directional framework a person is using, even when using terms that seem unequivocal in spatial information, it is necessary to know the cultural context for using and interpreting this information. Stated more generally, to understand human spatial thinking it is necessary to attend to the cultural practices people use to guide their exploration, memory and communication about large-scale space.

### Summary


#### Review Questions


# Hot Topic: What will spatial cognition be like in the future?

Mary Gauvain

Globalization is a pervasive force that is increasing connections across societies and cultures and rapidly transforming people and places around the world. A principal feature of globalization is integration of technology and other resources typically encountered in industrialized settings. These societallevel changes are significant for human cognition because they affect, on a daily basis, the work people do, the way children are cared for and educated, and the nature and strength of links between the community and the world beyond it. Thus, both inside and outside the home these changing conditions of life expose people to new and recurrent modes of acting, interacting, and learning that have direct relevance to psychological functioning.

Research shows that cultural tools contribute in meaningful ways to spatial thinking. Thus, a reasonable question to ask is what might spatial cognition be like in the future? One of the major changes taking place today are technologies that help people imagine, learn about, and explore large-scale space. Many of these changes are due to changing map technologies (e.g., geographic information system, or GIS; Global Positioning System, or GPS) and their impact on society is widespread and occurring at a rapid pace (Downs, 2014). These types of changes are not only affecting adults in communities, children also learn to use them. In fact, they may be the primary or only way many children today are learning to navigate in space. If this is true, these tools will introduce a new mode of thinking about and using space in the community going forward. The fact that these tools did not originate in many of the cultures adopting them is also an important part of this story. Furthermore, the rapid pace at which these technologies are being adopted may be destabilizing. Research has found that rapid, widespread change in a community can produce a breakdown of traditional cultural systems, difficulties for individuals in adjusting to the changes, and in some instances an increase in individual pathologies (Bodley, 1982; Munroe & Munroe, 1980).

Geospatial technologies connect people to the world beyond the community in many new and exciting and, also, unknown ways. Unlike earlier tools for navigation that often emerged from within the community itself, and therefore were shaped to local needs and values, community members are not involved in the creation of the geo-technology information that is used to guide their spatial activities. As Downs explains, "While users have options, the shape of the world is set by hardware and software designers. To the extent that we accept default settings of devices as given, our experience of the world is dictated by others (p. 9)." Thus, in using the default settings on these devices, there are benefits, but there are also tradeoffs for human spatial thinking. Research is needed on societal-level changes that result from the adoption and use of technologies to support spatial activity and how these changes may affect spatial thinking in the future.

#### References

Bodley, J. H. (1982). *Victims of progress*. Menlo Park, CA: Benjamin/Cummings.

Downs, R. M. (2014). Coming of age in the geospatial revolution: The geographic self re-defined. *Human Development*, *57*, 35–57. doi:10.1159/000358319

Munroe, R. L., & Munroe, R. H. (1980). Perspectives suggested by anthropological data. In H. C. Triandis & W. W. Lambert (Eds.), *Handbook of cross-cultural psychology* (Vol. 1, pp. 253–317). Boston, MA: Allyn & Bacon.

#### References


*thought* (pp. 387–427). Cambridge, MA: MIT Press. doi:10.7551/mitpress/4117.003.0021



*book of socialization: Theory and research* (2nd Ed., pp. 566–589). New York: Guilford.


Cambridge, England: Cambridge University Press. doi:10.1017/cbo9780511841057



# Glossary


common practices, and transmit information and ways of living across generations. 363


# Subject Index

abstraction, 56 accommodation, 349 Activity Theorist, 364 affect, 341–354 Affect as information theory, 343 affect congruence, 341–344, 347 Affect Infusion Model, 342, 344 affect-state dependence, 345 age, 316 alignment in dialogue, 204 analogical reasoning, 143 anchoring effect, 186 artificial grammars, 200 assimilation, 349 associationism, 20 associative network theory, 342 availability heuristic, 137 balance theory of wisdom, 316 basic level of categorization, 63 behavioral genetic research, 244 behaviorism, 21, 341 Berlin wisdom paradigm, 311 bias belief bias, 33, 116 confirmation bias, 137, 184 hindsight bias, 185 linguistic bias, 205 matching bias, 120 metacognitive bias, 96 status quo bias, 185 bilingual, 225 bilingualism, 201 biological maturation, 336 bottom-up processing, 224 brainstorming, 285 bullshit receptivity, 352, 357 calibration, 96

categorization, 214 category, 55 category-based induction, 139 Cattell-Horn-Carroll model, 259 causal hypothesis, 28 causal induction, 141 chess, 240 children, 146 classical approach, 57 cognition, 215 cognitive flexibility, 280 processing, 224 socialization, 363 cognitivism, 21, 341 collaboration, 286 color, 218 color perception, 367 common ground, 205 communication, 200, 354 interpersonal, 353 communicative convention, 368 complex skill(s), 235 computational model, 46 concept, 55 conditional inference, 117–119 conditioning, classical, 342 confidence, 92 consciousness, 219 consumption, 292 content validity, 261 convergent thinking, 279, 283 conversation, 78 core knowledge, 330 correlation study, 27 creative problem solving, 285 process, 290

creative potential, 279 creativity enhancement, 293 multivariate approach, 278, 280 techniques, 295 creator, 277 cross -cultural approach, 364 -linguistic studies, 224 cultural amplifiers, 369 history, 369 psychology, 364 symbol systems, 370 tool, 363 culture, 363 cumulative cultural evolution, 366 data analysis, 4 decision making, 223, 327 declarative knowledge, 222 deduction paradigm, 125 deeper levels of comprehension, 72 deficit interpretations, 365 deliberate practice, 239, 240 dependent variable, 27 development, 316 diagnostic task, 31 dialectic, 16 dialogue, 203 directional framework, 373 divergent thinking, 279 domain, 6 -generality, 6 -specificity, 6, 280 drift diffusion model, 35 dual-process theory, 123, 124 education, 266, 336 effort regulation, 91 Einstellung effect, 223 eliminative induction, 137 embodied cognition, 78 embodied language, 203 emotion, 341, 342 emotional intelligence, 260 empiricism, 4, 15

endowment effect, 185 enumerative induction, 137 ethnocentrism, 366 evidence accumulation, 35 executive function, 202 exemplar approach, 58 exemplar of wisdom, 308 experience, 336 experiment, 4, 28 expertise, 235, 281, 308, 336 extended cognition, 224 eye tracking, 41, 206 area of interest, 42 heatmaps, 42 scanpaths, 42 factor analysis, 257, 258 fallacy, 316 base-rate fallacy, 186 conjunction fallacy, 186 sunk-cost fallacy, 185 false belief task, 331 feeling(s) of rightness, 128 Flynn effect, 266 fMRI, 46 FOR, 128 foreclosure, 334 foreign language, 225 effect, 225 frame of reference, 218, 370 framing, 225 functional fixedness, 145, 223 functionalism, 19 g-factor, 256, 257 gender, 214 gender stereotype, 206 gene-environment correlation, 243 interaction, 243 interplay, 243 general intelligence, 124 genotype, 244 geo-visualization tools (GeoVis), 370 geographic information systems (GIS), 370 Gestalt, 21 global positioning systems (GPS), 370

globalization, 366 goal setting, 328 goal-action procedure, 75 GPS technology, 369 graded structure, 57 grammar, 218 gullibility, 352, 356 heuristic, 344 affect heuristic, 184 availability heuristic, 184 cues, 93 recognition heuristic, 184 higher order cognitive processes, 330 home-sign, 221 hypothesis, 4 hypothetical thinking, 113 identity achieved, 335 identity development, 334 identity diffused, 334 idiosyncrasy, 279, 282 illusory correlation, 137 implicit learning, 200 incubation, 282 independent variable, 27 individual differences, 255, 256, 366 induction development, 147 inductive evidence, 146 information board, 39 information processing strategy, 342 information search, 39 innovation, 287, 292 insight, 145 intelligence, 4 crystallized, 6, 257, 258, 261, 264 fluid, 6, 257, 258, 260–264 gene, 4 practical, 309 training, 266, 267 intervention, 315 introspection, 28, 29 introspection, criticism of, 29 IQ, 124

joint method of agreement and disagreement, 142 judgment, 100, 346, 350

prospective, 97 retrospective, 97 judgment accuracy, 96 Judgment of Learning, 91 knowledge, 281 procedural, 7 knowledge base, 327, 328 knowledge component, 72 knowledge representation, 71 language, 199, 213, 328 acquisition, 200 comprehension, 202 processing, 353 production, 203 leadership, 288 learning, 71 life experience, 309 linguistic determinism, 216 relativity, 214 tool, 218 logical intuition, 129 tautology, 334 logical contradiction, 334 logical form, 334 maximal cognitive effort, 255 measurement model, 33 measures of wisdom, 309 media, 81 memory, 224, 343–345 mental flexibility, 279 mental logic, 122 mental model theory, 122, 123 mental model(s), 122, 123 mental representation, 71, 328 mental state, 223 mental tour, 368 meta-analysis, 240 meta-memory, 90 meta-reasoning, 90 metacognition, 89 metacognitive control, 89 metacognitive myopia, 101

method of agreement, 141 of concomitant variation, 142 of disagreement, 141 methodological behaviorism, 29 mind-body dualism, 16 mnemonic, 371 molecular genetic, 245 monitoring, 89 mood, 341, 344–347, 349–353 moral dilemma, 225 moratorium, 335 MORE Life Experience Model, 318 mother tongue, 223 mouse-tracking, 44 multifactorial perspective, 235 multilingualism, 201 multiple intelligences theory, 259 music, 240

narrative, 368, 372 native language, 223 naturalistic decision making, 9 nature - nurture, 4, 236 navigational system, 371 neural network, 77 neuroimaging, 100 new paradigm, 125 new paradigm psychology of reasoning, 125 niche, developmental, 367

object permanence, 328 open-ended measure, 314 openness to experience, 279 operationalization, 27 originality, 291 overconfidence, 93

perception, 218 performance cognitive, 349 personal wisdom, 312 personality, 309 phenotype, 244, 246 philosophy, 341 planning, 328 positron emission tomography, 8 PPIK theory, 257 predictive validity, 265 premise monotonicity, 136 primacy effect, 350 primary mental abilities, 257 problem solving, 223, 327 procedural knowledge, 222 process creative, 282 prototype approach, 58 psychoanalysis, 342 psychology of human thought, 3 Ramsey test, 126 ratchet effect, 366 rational thinking disposition, 124 rationalism, 4, 15 reactivity, 29 reasoning, 223, 262–264, 327 reductionism, 18 reflection, 309 reliability, 311 representation, 200, 216 representational effect, 222 representativeness heuristic, 139 resolution, 96 response time analysis, 34 risk taking, 279, 281 ritual, 371 routine, 371 schema, 368 school, 371 script, 368 selection task, 119, 125 selective attention, 58 self-report scale, 312 sensorimotor stage, 328 set effect, 223 sex differences, 269 similarity, 56 social behavior, 343, 347 social learning, 366 sociocultural approach, 290, 364 sociocultural developmental approach, 286 sociohistorical means, 369 spatial

cognition, 363 memory, 371 orientation, 372 problems, 372 relations, 368 representational skills, 369 thinking skills, 372 spatial structure, 75 speed-accuracy tradeoff, 35 STEM, 369 stereotyping, 351 strategies, 3 strategy index, 41 study, longitudinal, 337 subtraction method, 34 supervised category learning, 61 syllogism, 114 syllogistic reasoning, 114 talent, 242 task cognitive, 366 developmental, 366 taxonomic structure, 74 technology, 81

tense, 216 thematic facilitation effect, 120 theory, 3 theory of mind, 223, 331 theory-driven approach, 60 thinking, 327 thinking-aloud-method, 29 thought, 213 Three-Stratum Theory, 258, 259 top-down processing, 224 twin study, 244 two-response task, 128 uncertainty, 309 unexpected contents task, 332

validity, 7, 311 verbal interference, 221

Wason selection task, 119 wason selection task, 125 wayfinding, 368 wisdom, 307 wise reasoning, 312 working memory capacity, 263, 264

# Author Index

# A



# B



Author Index




# C




# D



Dupont, J. . . . . . . . . 295, 299 Dzindolet, M. . . . . . . . . . . 287

# E


# F




# G


Gigerenzer, G. . . 3, 4, 32, 48, 97, 183, 184, 187 Gilbert, A. L. . . . . . . . . . . 219 Gilbert, D. T. . . . . . . . . . . 350 Gilboa-Schechtman, E. . 345, 346 Gillham, N. W. . . . . . . . . 236 Gippel, J. . . . . . . . . . . . . . . 288 Girotto, V. . . . . . . . . . . . . . 121 Glöckner, A. . . . . . 32, 40, 48 Glück, J. . 307–311, 313–319 Gladwell, M. . . . . . . . . . . 239 Gladwin. T. . . . . . . . 364, 371 Glaholt, M. G. . . . . . . . . . . 42 Glaser, M. . . . . . . . . . . . . . 100 Glaveanu, V. P. 283, 284, 290 Gleitman, H. . . . . . . . . . . . . 57 Gleitman, L. R. . . . . . . . . . 57 Glenberg, A. M. . . . . 78, 202 Glick, J. . . . . . . 331, 332, 364 Glimcher, P. W. . . . . . . . . 181 Glodowski, A.-S. . . . . . . . 157 Glover, G. . . . . . . . . . . . . . 177 Glucksberg, S. . . . . . . . . . 223 Gobbi, N. . . . . . . . . . . . . . 202 Gobet, F. . . . . . 235, 240, 241 Gödker, M. . . . . . . . . . . . . 169 Godman, N. . . . . . . . . . . . 134 Goel, V. . . . . . . . . . . . . . . . 296 Goeleven, E . . . . . . . . . . . 345 Gokalsing, E. . . . . . . . . . . . 92 Goldberg, A. B. . . . . . . . . . 78 Goldenberg, L. . . . . 349, 350 Goldin-Meadow, S. . . . . . 226 Goldman, S. R. . . . . . . . . . . 71 Goldsmith, K. . . . . . . . . . . 181 Goldsmith, M. . . . . . . . . . . 93 Goldstein, D. G. 32, 183–185, 187 Goldstone, R. L. . . . . . . . . 63 Goldwater, M. B. . . . . . . . . 63 Golfinos, J. G. . . . . . . . . . . 98 Göncü, A. . . . . . . . . 364, 365 Gonen-Yaacovi, G. . . . . . 295 Gonzalez, P. . . . . . . . . . . . 345 Goodnow, J. J. . 15, 365, 366, 371, 372


### H





Ivry, R. B. . . . . . . . . . . . . . . . 8

# J



# K




# L




# M






# O



# P



# Q

Quinteros Baumgart, C. . 202

# R



# S






# T


Author Index

Thomas, S. . . . . . . . . . . . . 314 Thompson, V. A. . 90, 94, 97, 128 Thomson, G. H. . . . . . . . . 257 Thorndike, E. L. . . . . 20, 237 Thornhill-Miller, B. 288, 295, 298, 299 Thurstone, L. L. . . . . . 6, 257 Thurstone, T. G. . . . . . . . . 257 Titchener, E. . . . . . . . . . . . . 19 Tobin, L. . . . . . . . . . . . . . . 104 Tolman, E. C. . . . . . . . . . . . 29 Tomasello, M. . . . . . 201, 366 Topolinski, S. . . . 94, 95, 145 Tordjman, S. . . . . . . . . . . . 298 Torrance, E. . . . . . . . . . . . 283 Townsend, J. T. . . . . . . . . . 35 Trahan, L. H. . . . . . . . . . . 266 Trautwein, U. . . . . . 261, 266 Treffinger, D. J. . . . . . . . . 295 Trofimovich, P. . . . . . . . . 205 Trzaskowski, M. . . . . . . . 245 Tschan, F. . . . . . . . . . . . . . 156 Tucker, D. M. . . . . . . . . . . . 98 Tucker, R. . . . . . . . . . . . . . 240 Tucker-Drob, E. M. 244, 245, 266 Tuermer, U. . . . . . . . . . . . 365 Tuholski, S. W. . . . . . . . . 264 Turkheimer, E. . . . . . . . . . 266 Turner, T. J. . . . . . . . . . . . . . 75 Tversky, A. 56, 137, 139, 179, 180, 184–187, 225 Tyler, L. K. . . . . . . . . . . . . . 64

# U


# V


# W





# X

Xu, X. . . . . . . . . . . . . . . . . 104

# Y


# Z


The "Psychology of Human Thought" is an "open access" collection of peer-reviewed chapters from all areas of higher cognitive processes. The book is intended to be used as a textbook in courses on higher processes, complex cognition, human thought, and related courses. Chapters include concept acquisition, knowledge representation, inductive and deductive reasoning, problem solving, metacognition, language, expertise, intelligence, creativity, wisdom, development of thought, affect and thought, and sections about history and about methods. The chapters are written by distinguished scholarly experts in their respective fields, coming from such diverse regions as North America, Great Britain, France, Germany, Norway, Israel, and Australia. The level of the chapters is addressed to advanced undergraduates and beginning graduate students.